Related papers: Benchmarking Simulation-Based Inference

Continuous Optimization Benchmarks by Simulation

Benchmark experiments are required to test, compare, tune, and understand optimization algorithms. Ideally, benchmark problems closely reflect real-world problem behavior. Yet, real-world problems are not always readily available for…

Neural and Evolutionary Computing · Computer Science 2020-08-17 Martin Zaefferer , Frederik Rehbach

Near Optimal Inference for the Best-Performing Algorithm

Consider a collection of competing machine learning algorithms. Given their performance on a benchmark of datasets, we would like to identify the best performing algorithm. Specifically, which algorithm is most likely to rank highest on a…

Machine Learning · Computer Science 2025-08-08 Amichai Painsky

Benchmarking Approximate Inference Methods for Neural Structured Prediction

Exact structured inference with neural network scoring functions is computationally challenging but several methods have been proposed for approximating inference. One approach is to perform gradient descent with respect to the output…

Computation and Language · Computer Science 2019-07-09 Lifu Tu , Kevin Gimpel

Position: Benchmarking is Limited in Reinforcement Learning Research

Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms. However, despite…

Machine Learning · Computer Science 2024-06-25 Scott M. Jordan , Adam White , Bruno Castro da Silva , Martha White , Philip S. Thomas

Randomness as Reference: Benchmark Metric for Optimization in Engineering

Benchmarking optimization algorithms is fundamental for the advancement of computational intelligence. However, widely adopted artificial test suites exhibit limited correspondence with the diversity and complexity of real-world engineering…

Computational Engineering, Finance, and Science · Computer Science 2026-04-17 Stefan Ivić , Siniša Družeta , Luka Grbčić

A Modular Workflow for Performance Benchmarking of Neuronal Network Simulations

Modern computational neuroscience strives to develop complex network models to explain dynamics and function of brains in health and disease. This process goes hand in hand with advancements in the theory of neuronal networks and increasing…

Neurons and Cognition · Quantitative Biology 2022-10-04 Jasper Albers , Jari Pronold , Anno Christopher Kurth , Stine Brekke Vennemo , Kaveh Haghighi Mood , Alexander Patronis , Dennis Terhorst , Jakob Jordan , Susanne Kunkel , Tom Tetzlaff , Markus Diesmann , Johanna Senk

BENCHIP: Benchmarking Intelligence Processors

The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in…

Performance · Computer Science 2017-11-28 Jinhua Tao , Zidong Du , Qi Guo , Huiying Lan , Lei Zhang , Shengyuan Zhou , Lingjie Xu , Cong Liu , Haifeng Liu , Shan Tang , Allen Rush , Willian Chen , Shaoli Liu , Yunji Chen , Tianshi Chen

Does imputation matter? Benchmark for predictive models

Incomplete data are common in practical applications. Most predictive machine learning models do not handle missing values so they require some preprocessing. Although many algorithms are used for data imputation, we do not understand the…

Machine Learning · Statistics 2020-07-07 Katarzyna Woźnica , Przemysław Biecek

The Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models

Predictive benchmarking, the evaluation of machine learning models based on predictive performance and competitive ranking, is a central epistemic practice in machine learning research and an increasingly prominent method for scientific…

Machine Learning · Computer Science 2025-10-28 Timo Freiesleben , Sebastian Zezulka

Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis

Causal inference analysis is the estimation of the effects of actions on outcomes. In the context of healthcare data this means estimating the outcome of counter-factual treatments (i.e. including treatments that were not observed) on a…

Methodology · Statistics 2018-03-21 Yishai Shimoni , Chen Yanover , Ehud Karavani , Yaara Goldschmnidt

Nested Performance Profiles for Benchmarking Software

In order to compare and benchmark the mathematical software, the performance profiles have been introduced [1]. However, it has been proved that the algorithm is not flawless. The main issue with the performance profile is that it may rank…

Optimization and Control · Mathematics 2020-01-31 Rasoul Hekmati , Hanieh Mirhajianmoghadam

Towards Realistic Optimization Benchmarks: A Questionnaire on the Properties of Real-World Problems

Benchmarks are a useful tool for empirical performance comparisons. However, one of the main shortcomings of existing benchmarks is that it remains largely unclear how they relate to real-world problems. What does an algorithm's performance…

Neural and Evolutionary Computing · Computer Science 2020-04-15 Koen van der Blom , Timo M. Deist , Tea Tušar , Mariapia Marchi , Yusuke Nojima , Akira Oyama , Vanessa Volz , Boris Naujoks

Black-box Bayesian inference for economic agent-based models

Simulation models, in particular agent-based models, are gaining popularity in economics. The considerable flexibility they offer, as well as their capacity to reproduce a variety of empirically observed behaviours of complex systems, give…

Econometrics · Economics 2024-02-20 Joel Dyer , Patrick Cannon , J. Doyne Farmer , Sebastian Schmon

NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems

Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it…

Artificial Intelligence · Computer Science 2025-01-16 Jason Yik , Korneel Van den Berghe , Douwe den Blanken , Younes Bouhadjar , Maxime Fabre , Paul Hueber , Weijie Ke , Mina A Khoei , Denis Kleyko , Noah Pacik-Nelson , Alessandro Pierro , Philipp Stratmann , Pao-Sheng Vincent Sun , Guangzhi Tang , Shenqi Wang , Biyan Zhou , Soikat Hasan Ahmed , George Vathakkattil Joseph , Benedetto Leto , Aurora Micheli , Anurag Kumar Mishra , Gregor Lenz , Tao Sun , Zergham Ahmed , Mahmoud Akl , Brian Anderson , Andreas G. Andreou , Chiara Bartolozzi , Arindam Basu , Petrut Bogdan , Sander Bohte , Sonia Buckley , Gert Cauwenberghs , Elisabetta Chicca , Federico Corradi , Guido de Croon , Andreea Danielescu , Anurag Daram , Mike Davies , Yigit Demirag , Jason Eshraghian , Tobias Fischer , Jeremy Forest , Vittorio Fra , Steve Furber , P. Michael Furlong , William Gilpin , Aditya Gilra , Hector A. Gonzalez , Giacomo Indiveri , Siddharth Joshi , Vedant Karia , Lyes Khacef , James C. Knight , Laura Kriener , Rajkumar Kubendran , Dhireesha Kudithipudi , Shih-Chii Liu , Yao-Hong Liu , Haoyuan Ma , Rajit Manohar , Josep Maria Margarit-Taulé , Christian Mayr , Konstantinos Michmizos , Dylan R. Muir , Emre Neftci , Thomas Nowotny , Fabrizio Ottati , Ayca Ozcelikkale , Priyadarshini Panda , Jongkil Park , Melika Payvand , Christian Pehle , Mihai A. Petrovici , Christoph Posch , Alpha Renner , Yulia Sandamirskaya , Clemens JS Schaefer , André van Schaik , Johannes Schemmel , Samuel Schmidgall , Catherine Schuman , Jae-sun Seo , Sadique Sheik , Sumit Bam Shrestha , Manolis Sifalakis , Amos Sironi , Kenneth Stewart , Matthew Stewart , Terrence C. Stewart , Jonathan Timcheck , Nergis Tömen , Gianvito Urgese , Marian Verhelst , Craig M. Vineyard , Bernhard Vogginger , Amirreza Yousefzadeh , Fatima Tuz Zohora , Charlotte Frenkel , Vijay Janapa Reddi

How Benchmark Prediction from Fewer Data Misses the Mark

Large language model (LLM) evaluation is increasingly costly, prompting interest in methods that speed up evaluation by shrinking benchmark datasets. Benchmark prediction (also called efficient LLM evaluation) aims to select a small subset…

Machine Learning · Computer Science 2025-06-10 Guanhua Zhang , Florian E. Dorner , Moritz Hardt

Bench360: Benchmarking Local LLM Inference from 360 Degrees

Running LLMs locally has become increasingly common, but users face a complex design space across models, quantization levels, inference engines, and serving scenarios. Existing inference benchmarks are fragmented and focus on isolated…

Computation and Language · Computer Science 2026-01-15 Linus Stuhlmann , Mauricio Fadel Argerich , Jonathan Fürst

Accelerated Randomized Benchmarking

Quantum information processing offers promising advances for a wide range of fields and applications, provided that we can efficiently assess the performance of the control applied in candidate systems. That is, we must be able to determine…

Quantum Physics · Physics 2015-01-26 Christopher Granade , Christopher Ferrie , D. G. Cory

BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices

AI models are increasingly prevalent in high-stakes environments, necessitating thorough assessment of their capabilities and risks. Benchmarks are popular for measuring these attributes and for comparing model performance, tracking…

Artificial Intelligence · Computer Science 2024-11-21 Anka Reuel , Amelia Hardy , Chandler Smith , Max Lamparth , Malcolm Hardy , Mykel J. Kochenderfer

Benchmarking Bayesian neural networks and evaluation metrics for regression tasks

Due to the growing adoption of deep neural networks in many fields of science and engineering, modeling and estimating their uncertainties has become of primary importance. Despite the growing literature about uncertainty quantification in…

Machine Learning · Computer Science 2023-02-15 Brian Staber , Sébastien Da Veiga

Towards Robust Benchmarking of Quantum Optimization Algorithms

Benchmarking the performance of quantum optimization algorithms is crucial for identifying utility for industry-relevant use cases. Benchmarking processes vary between optimization applications and depend on user-specified goals. The…

Quantum Physics · Physics 2024-05-14 David Bucher , Nico Kraus , Jonas Blenninger , Michael Lachner , Jonas Stein , Claudia Linnhoff-Popien