Related papers: Quantifying Performance Changes with Effect Size C…

Error Assessment of Computational Models in Chemistry

Computational models in chemistry rely on a number of approximations. The effect of such approximations on observables derived from them is often unpredictable. Therefore, it is challenging to quantify the uncertainty of a computational…

Chemical Physics · Physics 2017-04-21 Gregor N. Simm , Jonny Proppe , Markus Reiher

Quantitative Aspects of Programming Languages and Systems over the past $2^4$ years and beyond

Quantitative aspects of computation are related to the use of both physical and mathematical quantities, including time, performance metrics, probability, and measures for reliability and security. They are essential in characterizing the…

Programming Languages · Computer Science 2020-01-22 Alessandro Aldini

Beyond Point Estimates: Distributional Uncertainty in Machine Learning Performance Evaluation

Machine learning models are often evaluated using point estimates of performance metrics such as accuracy, F1 score, or mean squared error. Such summaries fail to capture the inherent variability induced by stochastic elements of the…

Machine Learning · Computer Science 2026-05-13 Christoph Lehmann , Yahor Paromau

How to benchmark: the Measure-Explain-Test-Improve loop

I would like to share recommendations on how to do performance benchmarks for the purpose of computer science research evaluation. Research in my field (programming language research) often involves performance considerations, but it is…

Programming Languages · Computer Science 2026-05-05 Gabriel Scherer

How Much is Performance Worth to Users? A Quantitative Approach

Architects and systems designers artfully balance multiple competing design constraints during the design process but are unable to translate between system metrics and end user experience. This work presents three methodologies to fill in…

Human-Computer Interaction · Computer Science 2022-05-02 Adam Hastings , Lydia B. Chilton , Simha Sethumadhavan

Experimentally efficient methods for estimating the performance of quantum measurements

Efficient methods for characterizing the performance of quantum measurements are important in the experimental quantum sciences. Ideally, one requires both a physically relevant distinguishability measure between measurement operations and…

Quantum Physics · Physics 2015-06-12 Easwar Magesan , Paola Cappellaro

Towards a Statistical Methodology to Evaluate Program Speedups and their Optimisation Techniques

The community of program optimisation and analysis, code performance evaluation, parallelisation and optimising compilation has published since many decades hundreds of research and engineering articles in major conferences and journals.…

Performance · Computer Science 2009-07-06 Sid Touati

An Empirical Study of Bitwise Operators Intuitiveness through Performance Metrics

Objectives: This study aims to investigate the readability and understandability of bitwise operators in programming, with the main hypothesis that there will be a difference in the performance metrics (response time and error rate) between…

Software Engineering · Computer Science 2025-10-28 Shubham Joshi

Proceedings 11th International Workshop on Quantitative Aspects of Programming Languages and Systems

Quantitative aspects of computation are important and sometimes essential in characterising the behavior and determining the properties of systems. They are related to the use of physical quantities (storage space, time, bandwidth, etc.) as…

Logic in Computer Science · Computer Science 2013-08-15 Luca Bortolussi , Herbert Wiklicky

Inductive Conformal Prediction under Data Scarcity: Exploring the Impacts of Nonconformity Measures

Conformal prediction, which makes no distributional assumptions about the data, has emerged as a powerful and reliable approach to uncertainty quantification in practical applications. The nonconformity measure used in conformal prediction…

Machine Learning · Computer Science 2024-10-15 Yuko Kato , David M. J. Tax , Marco Loog

A Test for Evaluating Performance in Human-Computer Systems

The Turing test for comparing computer performance to that of humans is well known, but, surprisingly, there is no widely used test for comparing how much better human-computer systems perform relative to humans alone, computers alone, or…

Human-Computer Interaction · Computer Science 2022-06-30 Andres Campero , Michelle Vaccaro , Jaeyoon Song , Haoran Wen , Abdullah Almaatouq , Thomas W. Malone

Prediction of High-Performance Computing Input/Output Variability and Its Application to Optimization for System Configurations

Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-16 Li Xu , Thomas Lux , Tyler Chang , Bo Li , Yili Hong , Layne Watson , Ali Butt , Danfeng Yao , Kirk Cameron

Determinism, Complexity, and Predictability in Computer Performance

Computers are deterministic dynamical systems (CHAOS 19:033124, 2009). Among other things, that implies that one should be able to use deterministic forecast rules to predict their behavior. That statement is sometimes-but not always-true.…

Chaotic Dynamics · Physics 2013-05-24 Joshua Garland , Ryan James , Elizabeth Bradley

Learning Prediction Intervals for Model Performance

Understanding model performance on unlabeled data is a fundamental challenge of developing, deploying, and maintaining AI systems. Model performance is typically evaluated using test sets or periodic manual quality assessments, both of…

Machine Learning · Computer Science 2020-12-17 Benjamin Elder , Matthew Arnold , Anupama Murthi , Jiri Navratil

Quantifying Language Disparities in Multilingual Large Language Models

Results reported in large-scale multilingual evaluations are often fragmented and confounded by factors such as target languages, differences in experimental setups, and model choices. We propose a framework that disentangles these…

Computation and Language · Computer Science 2025-08-26 Songbo Hu , Ivan Vulić , Anna Korhonen

Non-deterministic Behavior of Ranking-based Metrics when Evaluating Embeddings

Embedding data into vector spaces is a very popular strategy of pattern recognition methods. When distances between embeddings are quantized, performance metrics become ambiguous. In this paper, we present an analysis of the ambiguity…

Computer Vision and Pattern Recognition · Computer Science 2019-02-21 Anguelos Nicolaou , Sounak Dey , Vincent Christlein , Andreas Maier , Dimosthenis Karatzas

A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: Avoiding Unreliable Conclusions

A key trait of stochastic optimizers is that multiple runs of the same optimizer in attempting to solve the same problem can produce different results. As a result, their performance is evaluated over several repeats, or runs, on the…

Machine Learning · Computer Science 2026-05-18 Moslem Noori , Elisabetta Valiante , Thomas Van Vaerenbergh , Masoud Mohseni , Ignacio Rozada

Predictive Performance Comparison of Decision Policies Under Confounding

Predictive models are often introduced to decision-making tasks under the rationale that they improve performance over an existing decision-making policy. However, it is challenging to compare predictive performance against an existing…

Machine Learning · Computer Science 2024-06-13 Luke Guerdan , Amanda Coston , Kenneth Holstein , Zhiwei Steven Wu

Statistical significance in choice modelling: computation, usage and reporting

This paper offers a commentary on the use of notions of statistical significance in choice modelling. We review the reasons for uncertainty in parameter estimates, provide a precise discussion on the computation of measures of uncertainty…

Econometrics · Economics 2026-05-18 Stephane Hess , Andrew Daly , Michiel Bliemer , Angelo Guevara , Ricardo Daziano , Thijs Dekker

Confidence Intervals for Evaluation of Data Mining

In data mining, when binary prediction rules are used to predict a binary outcome, many performance measures are used in a vast array of literature for the purposes of evaluation and comparison. Some examples include classification…

Machine Learning · Statistics 2025-07-08 Zheng Yuan , Wenxin Jiang