Related papers: Benchmarking Simulation-Based Inference

Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework

Commonly, AI or machine learning (ML) models are evaluated on benchmark datasets. This practice supports innovative methodological research, but benchmark performance can be poorly correlated with performance in real-world applications -- a…

Machine Learning · Computer Science 2024-06-18 Olivier Binette , Jerome P. Reiter

Scientific Machine Learning Benchmarks

The breakthrough in Deep Learning neural networks has transformed the use of AI and machine learning technologies for the analysis of very large experimental datasets. These datasets are typically generated by large-scale experimental…

Machine Learning · Computer Science 2021-10-26 Jeyan Thiyagalingam , Mallikarjun Shankar , Geoffrey Fox , Tony Hey

MECHBench: A Set of Black-Box Optimization Benchmarks originated from Structural Mechanics

Benchmarking is essential for developing and evaluating black-box optimization algorithms, providing a structured means to analyze their search behavior. Its effectiveness relies on carefully selected problem sets used for evaluation. To…

Neural and Evolutionary Computing · Computer Science 2025-11-17 Iván Olarte Rodríguez , Maria Laura Santoni , Fabian Duddeck , Carola Doerr , Thomas Bäck , Elena Raponi

Classification and Bayesian Optimization for Likelihood-Free Inference

Some statistical models are specified via a data generating process for which the likelihood function cannot be computed in closed form. Standard likelihood-based inference is then not feasible but the model parameters can be inferred by…

Computation · Statistics 2015-02-20 Michael U. Gutmann , Jukka Corander , Ritabrata Dutta , Samuel Kaski

How to benchmark: the Measure-Explain-Test-Improve loop

I would like to share recommendations on how to do performance benchmarks for the purpose of computer science research evaluation. Research in my field (programming language research) often involves performance considerations, but it is…

Programming Languages · Computer Science 2026-05-05 Gabriel Scherer

Amortised Likelihood-free Inference for Expensive Time-series Simulators with Signatured Ratio Estimation

Simulation models of complex dynamics in the natural and social sciences commonly lack a tractable likelihood function, rendering traditional likelihood-based statistical inference impossible. Recent advances in machine learning have…

Machine Learning · Statistics 2022-02-24 Joel Dyer , Patrick Cannon , Sebastian M Schmon

Contemporary Symbolic Regression Methods and their Relative Performance

Many promising approaches to symbolic regression have been presented in recent years, yet progress in the field continues to suffer from a lack of uniform, robust, and transparent benchmarking standards. In this paper, we address this…

Neural and Evolutionary Computing · Computer Science 2021-08-02 William La Cava , Patryk Orzechowski , Bogdan Burlacu , Fabrício Olivetti de França , Marco Virgolin , Ying Jin , Michael Kommenda , Jason H. Moore

Top-N Recommendation Algorithms: A Quest for the State-of-the-Art

Research on recommender systems algorithms, like other areas of applied machine learning, is largely dominated by efforts to improve the state-of-the-art, typically in terms of accuracy measures. Several recent research works however…

Information Retrieval · Computer Science 2022-05-16 Vito Walter Anelli , Alejandro Bellogín , Tommaso Di Noia , Dietmar Jannach , Claudio Pomo

Evaluating prediction systems in software project estimation

Context: Software engineering has a problem in that when we empirically evaluate competing prediction systems we obtain conflicting results. Objective: To reduce the inconsistency amongst validation study results and provide a more formal…

Software Engineering · Computer Science 2021-01-15 Martin Shepperd , Stephen G. MacDonell

A Comprehensive Assessment Benchmark for Rigorously Evaluating Deep Learning Image Classifiers

Reliable and robust evaluation methods are a necessary first step towards developing machine learning models that are themselves robust and reliable. Unfortunately, current evaluation protocols typically used to assess classifiers fail to…

Machine Learning · Computer Science 2025-05-26 Michael W. Spratling

Benchmarking Bayesian quantum estimation

The quest for precision in parameter estimation is a fundamental task in different scientific areas. The relevance of this problem thus provided the motivation to develop methods for the application of quantum resources to estimation…

Quantum Physics · Physics 2024-06-18 Valeria Cimini , Emanuele Polino , Mauro Valeri , Nicolò Spagnolo , Fabio Sciarrino

The Need for Benchmarks to Advance AI-Enabled Player Risk Detection in Gambling

Artificial intelligence-based systems for player risk detection have become central to harm prevention efforts in the gambling industry. However, growing concerns around transparency and effectiveness have highlighted the absence of…

Computers and Society · Computer Science 2026-02-20 Kasra Ghaharian , Simo Dragicevic , Chris Percy , Sarah E. Nelson , W. Spencer Murch , Robert M. Heirene , Kahlil Simeon-Rose , Tracy Schrans

The Benchmark Lottery

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods. This paper proposes the notion of "a benchmark lottery" that describes the…

Machine Learning · Computer Science 2021-07-19 Mostafa Dehghani , Yi Tay , Alexey A. Gritsenko , Zhe Zhao , Neil Houlsby , Fernando Diaz , Donald Metzler , Oriol Vinyals

Towards black-box parameter estimation

Deep learning algorithms have recently shown to be a successful tool in estimating parameters of statistical models for which simulation is easy, but likelihood computation is challenging. But the success of these approaches depends on…

Machine Learning · Statistics 2024-02-20 Amanda Lenzi , Haavard Rue

Part-X: A Family of Stochastic Algorithms for Search-Based Test Generation with Probabilistic Guarantees

Requirements driven search-based testing (also known as falsification) has proven to be a practical and effective method for discovering erroneous behaviors in Cyber-Physical Systems. Despite the constant improvements on the performance and…

Machine Learning · Computer Science 2021-10-26 Giulia Pedrielli , Tanmay Khandait , Surdeep Chotaliya , Quinn Thibeault , Hao Huang , Mauricio Castillo-Effen , Georgios Fainekos

An Empirical Evaluation of a Randomized Algorithm for Probabilistic Inference

In recent years, researchers in decision analysis and artificial intelligence (Al) have used Bayesian belief networks to build models of expert opinion. Using standard methods drawn from the theory of computational complexity, workers in…

Artificial Intelligence · Computer Science 2013-04-08 R. Martin Chavez , Gregory F. Cooper

Benchmarking that Matters: Rethinking Benchmarking for Practical Impact

Benchmarking has driven scientific progress in Evolutionary Computation, yet current practices fall short of real-world needs. Widely used synthetic suites such as BBOB and CEC isolate algorithmic phenomena but poorly reflect the structure,…

Neural and Evolutionary Computing · Computer Science 2025-11-18 Anna V. Kononova , Niki van Stein , Olaf Mersmann , Thomas Bäck , Thomas Bartz-Beielstein , Tobias Glasmachers , Michael Hellwig , Sebastian Krey , Jakub Kůdela , Boris Naujoks , Leonard Papenmeier , Elena Raponi , Quentin Renau , Jeroen Rook , Lennart Schäpermeier , Diederick Vermetten , Daniela Zaharie

Discriminating Equivalent Algorithms via Relative Performance

In scientific computing, it is common that a mathematical expression can be computed by many different algorithms (sometimes over hundreds), each identifying a specific sequence of library calls. Although mathematically equivalent, those…

Performance · Computer Science 2021-09-15 Aravind Sankaran , Paolo Bientinesi

Predicting the Performance of a Computing System with Deep Networks

Predicting the performance and energy consumption of computing hardware is critical for many modern applications. This will inform procurement decisions, deployment decisions, and autonomic scaling. Existing approaches to understanding the…

Machine Learning · Computer Science 2023-02-28 Mehmet Cengiz , Matthew Forshaw , Amir Atapour-Abarghouei , Andrew Stephen McGough

Fair Benchmarking of Optimisation Applications

Quantum optimisation is emerging as a promising approach alongside classical heuristics and specialised hardware, yet its performance is often difficult to assess fairly. Traditional benchmarking methods, rooted in digital complexity…

Quantum Physics · Physics 2025-12-10 Frank Phillipson