Related papers: AI-driven Java Performance Testing: Balancing Resu…

Towards effective assessment of steady state performance in Java software: Are we there yet?

Microbenchmarking is a widely used form of performance testing in Java software. A microbenchmark repeatedly executes a small chunk of code while collecting measurements related to its performance. Due to Java Virtual Machine optimizations,…

Software Engineering · Computer Science 2022-11-30 Luca Traini , Vittorio Cortellessa , Daniele Di Pompeo , Michele Tucci

The Future of Software Testing: AI-Powered Test Case Generation and Validation

Software testing is a crucial phase in the software development lifecycle (SDLC), ensuring that products meet necessary functional, performance, and quality benchmarks before release. Despite advancements in automation, traditional methods…

Software Engineering · Computer Science 2026-03-10 Mohammad Baqar , Rajat Khanda

$\mu$OpTime: Statically Reducing the Execution Time of Microbenchmark Suites Using Stability Metrics

Performance regressions have a tremendous impact on the quality of software. One way to catch regressions before they reach production is executing performance tests before deployment, e.g., using microbenchmarks, which measure performance…

Software Engineering · Computer Science 2025-10-22 Nils Japke , Martin Grambow , Christoph Laaber , David Bermbach

A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time Series

This paper addresses the challenge of accurately detecting the transition from the warmup phase to the steady state in performance metric time series, which is a critical step for effective benchmarking. The goal is to introduce a method…

Performance · Computer Science 2025-11-17 Martin Beseda , Vittorio Cortellessa , Daniele Di Pompeo , Luca Traini , Michele Tucci

AI Application Benchmarking: Power-Aware Performance Analysis for Vision and Language Models

Artificial Intelligence (AI) workloads drive a rapid expansion of high-performance computing (HPC) infrastructures and increase their power and energy demands towards a critical level. AI benchmarks representing state-of-the art workloads…

Performance · Computer Science 2026-03-18 Martin Mayr , Sebastian Wind , Lukas Schröder , Georg Hager , Harald Köstler , Gerhard Wellein

A Rosetta Stone for AI Benchmarks

Most AI benchmarks saturate within years or even months after they are introduced, making it hard to study long-run trends in AI capabilities. To address this challenge, we build a statistical framework that stitches benchmarks together,…

Artificial Intelligence · Computer Science 2025-12-02 Anson Ho , Jean-Stanislas Denain , David Atanasov , Samuel Albanie , Rohin Shah

Fault-Tolerant Evaluation for Sample-Efficient Model Performance Estimators

In the era of Model-as-a-Service, organizations increasingly rely on third-party AI models for rapid deployment. However, the dynamic nature of emerging AI applications, the continual introduction of new datasets, and the growing number of…

Machine Learning · Computer Science 2026-02-10 Zihan Zhu , Yanqiu Wu , Qiongkai Xu

Generative AI in Software Testing: Current Trends and Future Directions

This paper investigates current software testing systems and explores how artificial intelligence, specifically Generative AI, can be integrated to enhance these systems. It begins by examining different types of AI systems and focuses on…

Software Engineering · Computer Science 2026-03-03 Tanish Singla , Qusay H. Mahmoud

An Empirical Study on Method-Level Performance Evolution in Open-Source Java Projects

Performance is a critical quality attribute in software development, yet the impact of method-level code changes on performance evolution remains poorly understood. While developers often make intuitive assumptions about which types of…

Software Engineering · Computer Science 2025-08-12 Kaveh Shahedi , Nana Gyambrah , Heng Li , Maxime Lamothe , Foutse Khomh

PerfDetectiveAI -- Performance Gap Analysis and Recommendation in Software Applications

PerfDetectiveAI, a conceptual framework for performance gap analysis and suggestion in software applications is introduced in this research. For software developers, retaining a competitive edge and providing exceptional user experiences…

Software Engineering · Computer Science 2023-06-13 Vivek Basavegowda Ramu

ConsumerBench: Benchmarking Generative AI Applications on End-User Devices

The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience. This paper presents ConsumerBench, a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-24 Yile Gu , Rohan Kadekodi , Hoang Nguyen , Keisuke Kamahori , Yiyu Liu , Baris Kasikci

AI-powered software testing tools: A systematic review and empirical assessment of their features and limitations

Context: The rise of Artificial Intelligence (AI) in software engineering has led to the development of AI-powered test automation tools, promising improved efficiency, reduced maintenance effort, and enhanced defect-detection. However, a…

Software Engineering · Computer Science 2025-05-02 Vahid Garousi , Nithin Joy , Zafar Jafarov , Alper Buğra Keleş , Sevde Değirmenci , Ece Özdemir , Ryan Zarringhalami

Benchmarking Blunders and Things That Go Bump in the Night

Benchmarking; by which I mean any computer system that is driven by a controlled workload, is the ultimate in performance testing and simulation. Aside from being a form of institutionalized cheating, it also offer countless opportunities…

Performance · Computer Science 2009-09-29 Neil J. Gunther

PACE: A Program Analysis Framework for Continuous Performance Prediction

Software development teams establish elaborate continuous integration pipelines containing automated test cases to accelerate the development process of software. Automated tests help to verify the correctness of code modifications…

Software Engineering · Computer Science 2023-12-05 Chidera Biringa , Gokhan Kul

Efficiency Matters: Speeding Up Automated Testing with GUI Rendering Inference

Due to the importance of Android app quality assurance, many automated GUI testing tools have been developed. Although the test algorithms have been improved, the impact of GUI rendering has been overlooked. On the one hand, setting a long…

Software Engineering · Computer Science 2023-02-28 Sidong Feng , Mulong Xie , Chunyang Chen

Benchmarking AI-based data assimilation to advance data-driven global weather forecasting

Research on Artificial Intelligence (AI)-based Data Assimilation (DA) is expanding rapidly. However, the absence of an objective, comprehensive, and real-world benchmark hinders the fair comparison of diverse methods. Here, we introduce…

Machine Learning · Computer Science 2026-02-17 Wuxin Wang , Weicheng Ni , Ben Fei , Tao Han , Lilan Huang , Taikang Yuan , Xiaoyong Li , Lei Bai , Boheng Duan , Kaijun Ren

Test Automation Maturity Improves Product Quality -- Quantitative Study of Open Source Projects Using Continuous Integration

The popularity of continuous integration (CI) is increasing as a result of market pressure to release product features or updates frequently. The ability of CI to deliver quality at speed depends on reliable test automation. In this paper,…

Software Engineering · Computer Science 2022-02-10 Yuqing Wang , Mika Mäntylä , Zihao Liu , Jouni Markkula

Benchmarking Energy and Latency in TinyML: A Novel Method for Resource-Constrained AI

The rise of IoT has increased the need for on-edge machine learning, with TinyML emerging as a promising solution for resource-constrained devices such as MCU. However, evaluating their performance remains challenging due to diverse…

Machine Learning · Computer Science 2025-12-01 Pietro Bartoli , Christian Veronesi , Andrea Giudici , David Siorpaes , Diana Trojaniello , Franco Zappa

Testing Framework for Black-box AI Models

With widespread adoption of AI models for important decision making, ensuring reliability of such models remains an important challenge. In this paper, we present an end-to-end generic framework for testing AI Models which performs…

Machine Learning · Computer Science 2021-02-12 Aniya Aggarwal , Samiulla Shaikh , Sandeep Hans , Swastik Haldar , Rema Ananthanarayanan , Diptikalyan Saha

RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges

Accurate evaluation of weather forecasting models is critical for their reliable deployment in real-world applications. However, existing benchmarks predominantly rely on reanalysis products such as ERA5, which are generated through delayed…

Machine Learning · Computer Science 2026-05-26 Ruize Li , Zhibin Wen , Tao Han , Hao Chen , Fenghua Ling , Wei Zhang , Song Guo , Lei Bai