Related papers: SuperBench: Improving Cloud AI Infrastructure Reli…

ConsumerBench: Benchmarking Generative AI Applications on End-User Devices

The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience. This paper presents ConsumerBench, a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-24 Yile Gu , Rohan Kadekodi , Hoang Nguyen , Keisuke Kamahori , Yiyu Liu , Baris Kasikci

Duet instrumentation: An Agentic Approach to Improving Sensitivity in Cloud Service Benchmarking

Continuous cloud service performance benchmarking is essential for detecting performance bugs early before deploying them to production. However, detecting performance regressions using application benchmarks, which usually treat the system…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-19 Sebastian Koch , Nils Japke , David Bermbach

Evaluation and Incident Prevention in an Enterprise AI Assistant

Enterprise AI Assistants are increasingly deployed in domains where accuracy is paramount, making each erroneous output a potentially significant incident. This paper presents a comprehensive framework for monitoring, benchmarking, and…

Artificial Intelligence · Computer Science 2025-04-22 Akash V. Maharaj , David Arbour , Daniel Lee , Uttaran Bhattacharya , Anup Rao , Austin Zane , Avi Feller , Kun Qian , Yunyao Li

BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices

AI models are increasingly prevalent in high-stakes environments, necessitating thorough assessment of their capabilities and risks. Benchmarks are popular for measuring these attributes and for comparing model performance, tracking…

Artificial Intelligence · Computer Science 2024-11-21 Anka Reuel , Amelia Hardy , Chandler Smith , Max Lamparth , Malcolm Hardy , Mykel J. Kochenderfer

TailBench++: Flexible Multi-Client, Multi-Server Benchmarking for Latency-Critical Workloads

Cloud systems have rapidly expanded worldwide in the last decade, shifting computational tasks to cloud servers where clients submit their requests. Among cloud workloads, latency-critical applications -- characterized by high-percentile…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-07 Zhilin Li , Lucia Pons , Salvador Petit , Julio Sahuquillo , Julio Pons

When Can We Trust Deep Neural Networks? Towards Reliable Industrial Deployment with an Interpretability Guide

The deployment of AI systems in safety-critical domains, such as industrial defect inspection, autonomous driving, and medical diagnosis, is severely hampered by their lack of reliability. A single undetected erroneous prediction can lead…

Computer Vision and Pattern Recognition · Computer Science 2026-04-22 Hang-Cheng Dong , Yuhao Jiang , Yibo Jiao , Lu Zou , Kai Zheng , Bingguo Liu , Dong Ye , Guodong Liu

Challenge AI Mind: A Crowd System for Proactive AI Testing

Artificial Intelligence (AI) has burrowed into our lives in various aspects; however, without appropriate testing, deployed AI systems are often being criticized to fail in critical and embarrassing cases. Existing testing approaches mainly…

Artificial Intelligence · Computer Science 2018-10-23 Siwei Fu , Anbang Xu , Xiaotong Liu , Huimin Zhou , Rama Akkiraju

RobustBench: a standardized adversarial robustness benchmark

As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models. A key challenge in benchmarking…

Machine Learning · Computer Science 2021-11-02 Francesco Croce , Maksym Andriushchenko , Vikash Sehwag , Edoardo Debenedetti , Nicolas Flammarion , Mung Chiang , Prateek Mittal , Matthias Hein

Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency

In recent years, the integration of artificial intelligence (AI) and cloud computing has emerged as a promising avenue for addressing the growing computational demands of AI applications. This paper presents a comprehensive study of…

Machine Learning · Computer Science 2023-04-28 Neelesh Mungoli

Introducing Milabench: Benchmarking Accelerators for AI

AI workloads, particularly those driven by deep learning, are introducing novel usage patterns to high-performance computing (HPC) systems that are not comprehensively captured by standard HPC benchmarks. As one of the largest academic…

Machine Learning · Computer Science 2024-11-26 Pierre Delaunay , Xavier Bouthillier , Olivier Breuleux , Satya Ortiz-Gagné , Olexa Bilaniuk , Fabrice Normandin , Arnaud Bergeron , Bruno Carrez , Guillaume Alain , Soline Blanc , Frédéric Osterrath , Joseph Viviano , Roger Creus-Castanyer Darshan Patil , Rabiul Awal , Le Zhang

Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance

The rapid advancement of AI has expanded its capabilities across domains, yet introduced critical technical vulnerabilities, such as algorithmic bias and adversarial sensitivity, that pose significant societal risks, including…

Cryptography and Security · Computer Science 2025-08-19 Yuchu Jiang , Jian Zhao , Yuchen Yuan , Tianle Zhang , Yao Huang , Yanghao Zhang , Yan Wang , Yanshu Li , Xizhong Guo , Yusheng Zhao , Jun Zhang , Zhi Zhang , Xiaojian Lin , Yixiu Zou , Haoxuan Ma , Yuhu Shang , Yuzhi Hu , Keshu Cai , Ruochen Zhang , Boyuan Chen , Yilan Gao , Ziheng Jiao , Yi Qin , Shuangjun Du , Xiao Tong , Zhekun Liu , Yu Chen , Xuankun Rong , Rui Wang , Yejie Zheng , Zhaoxin Fan , Murat Sensoy , Hongyuan Zhang , Pan Zhou , Lei Jin , Hao Zhao , Xu Yang , Jiaojiao Zhao , Jianshu Li , Joey Tianyi Zhou , Zhi-Qi Cheng , Longtao Huang , Zhiyi Liu , Zheng Zhu , Jianan Li , Gang Wang , Qi Li , Xu-Yao Zhang , Yaodong Yang , Mang Ye , Wenqi Ren , Zhaofeng He , Hang Su , Rongrong Ni , Liping Jing , Xingxing Wei , Junliang Xing , Massimo Alioto , Shengmei Shen , Petia Radeva , Dacheng Tao , Ya-Qin Zhang , Shuicheng Yan , Chi Zhang , Zhongjiang He , Xuelong Li

Evaluating the Evaluators: Trust in Adversarial Robustness Tests

Despite significant progress in designing powerful adversarial evasion attacks for robustness verification, the evaluation of these methods often remains inconsistent and unreliable. Many assessments rely on mismatched models, unverified…

Cryptography and Security · Computer Science 2025-07-08 Antonio Emanuele Cinà , Maura Pintor , Luca Demetrio , Ambra Demontis , Battista Biggio , Fabio Roli

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

Generative AI systems achieve impressive performance on standard benchmarks yet fail to deliver real-world utility, a disconnect we identify across 28 deployment cases spanning education, healthcare, software engineering, and law. We argue…

Machine Learning · Computer Science 2026-05-12 Ishani Mondal , Shweta Bhardwaj

Towards Runtime Verification via Event Stream Processing in Cloud Computing Infrastructures

Software bugs in cloud management systems often cause erratic behavior, hindering detection, and recovery of failures. As a consequence, the failures are not timely detected and notified, and can silently propagate through the system. To…

Software Engineering · Computer Science 2022-03-09 Domenico Cotroneo , Luigi De Simone , Pietro Liguori , Roberto Natella , Angela Scibelli

PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI

Personalized AI agents rely on access to a user's digital footprint, which often includes sensitive data from private emails, chats and purchase histories. Yet this access creates a fundamental societal and privacy risk: systems lacking…

Computation and Language · Computer Science 2026-01-01 Srija Mukhopadhyay , Sathwik Reddy , Shruthi Muthukumar , Jisun An , Ponnurangam Kumaraguru

FIRED: a fine-grained robust performance diagnosis framework for cloud applications

To run a cloud application with the required service quality, operators have to continuously monitor the cloud application's run-time status, detect potential performance anomalies, and diagnose the root causes of anomalies. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-01 Ruyue Xin , Hongyun Liu , Peng Chen , Paola Grosso , Zhiming Zhao

Beyond One-Time Validation: A Framework for Adaptive Validation of Prognostic and Diagnostic AI-based Medical Devices

Prognostic and diagnostic AI-based medical devices hold immense promise for advancing healthcare, yet their rapid development has outpaced the establishment of appropriate validation methods. Existing approaches often fall short in…

Machine Learning · Computer Science 2024-09-10 Florian Hellmeier , Kay Brosien , Carsten Eickhoff , Alexander Meyer

ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation

Evaluating generative AI models is increasingly resource-intensive due to slow inference, expensive raters, and a rapidly growing landscape of models and benchmarks. We propose ProEval, a proactive evaluation framework that leverages…

Machine Learning · Computer Science 2026-04-28 Yizheng Huang , Wenjun Zeng , Aditi Kumaresan , Zi Wang

The Competence Shadow: Theory and Bounds of AI Assistance in Safety Engineering

As AI assistants become integrated into safety engineering workflows for Physical AI systems, a critical question emerges: does AI assistance improve safety analysis quality, or introduce systematic blind spots that surface only through…

Artificial Intelligence · Computer Science 2026-03-30 Umair Siddique

AFDI: A Virtualization-based Accelerated Fault Diagnosis Innovation for High Availability Computing

Fault diagnosis has attracted extensive attention for its importance in the exceedingly fault management framework for cloud virtualization, despite the fact that fault diagnosis becomes more difficult due to the increasing scalability and…

Software Engineering · Computer Science 2015-07-30 Ameen Alkasem , Hongwei Liu , Zuo Decheng , Yao Zhao