Related papers: Benchmarking Simulation-Based Inference

Simulating classification models to evaluate Predict-Then-Optimize methods

Uncertainty in optimization is often represented as stochastic parameters in the optimization model. In Predict-Then-Optimize approaches, predictions of a machine learning model are used as values for such parameters, effectively…

Machine Learning · Computer Science 2025-12-03 Pieter Smet

SBI -- A toolkit for simulation-based inference

Scientists and engineers employ stochastic numerical simulators to model empirically observed phenomena. In contrast to purely statistical models, simulators express scientific principles that provide powerful inductive biases, improve…

Machine Learning · Computer Science 2020-07-23 Alvaro Tejero-Cantero , Jan Boelts , Michael Deistler , Jan-Matthis Lueckmann , Conor Durkan , Pedro J. Gonçalves , David S. Greenberg , Jakob H. Macke

Fantastic Bugs and Where to Find Them in AI Benchmarks

Benchmarks are pivotal in driving AI progress, and invalid benchmark questions frequently undermine their reliability. Manually identifying and correcting errors among thousands of benchmark questions is not only infeasible but also a…

Artificial Intelligence · Computer Science 2025-11-24 Sang Truong , Yuheng Tu , Michael Hardy , Anka Reuel , Zeyu Tang , Jirayu Burapacheep , Jonathan Perera , Chibuike Uwakwe , Ben Domingue , Nick Haber , Sanmi Koyejo

INFERNO: Inference-Aware Neural Optimisation

Complex computer simulations are commonly required for accurate data modelling in many scientific disciplines, making statistical inference challenging due to the intractability of the likelihood evaluation for the observed data.…

Machine Learning · Statistics 2019-10-02 Pablo de Castro , Tommaso Dorigo

Quantum Computer Benchmarking via Quantum Algorithms

We present a framework that utilizes quantum algorithms, an architecture aware quantum noise model and an ideal simulator to benchmark quantum computers. The benchmark metrics highlight the difference between the quantum computer evolution…

Quantum Physics · Physics 2021-12-20 Konstantinos Georgopoulos , Clive Emary , Paolo Zuliani

Revisiting, Benchmarking and Exploring API Recommendation: How Far Are We?

Application Programming Interfaces (APIs), which encapsulate the implementation of specific functions as interfaces, greatly improve the efficiency of modern software development. As numbers of APIs spring up nowadays, developers can hardly…

Software Engineering · Computer Science 2021-12-24 Yun Peng , Shuqing Li , Wenwei Gu , Yichen Li , Wenxuan Wang , Cuiyun Gao , Michael Lyu

Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies

The objective comparison of Reinforcement Learning (RL) algorithms is notoriously complex as outcomes and benchmarking of performances of different RL approaches are critically sensitive to environmental design, reward structures, and…

Machine Learning · Computer Science 2026-03-19 Sinan Ibrahim , Grégoire Ouerdane , Hadi Salloum , Henni Ouerdane , Stefan Streif , Pavel Osinenko

Benchmarking as Empirical Standard in Software Engineering Research

In empirical software engineering, benchmarks can be used for comparing different methods, techniques and tools. However, the recent ACM SIGSOFT Empirical Standards for Software Engineering Research do not include an explicit checklist for…

Software Engineering · Computer Science 2021-05-04 Wilhelm Hasselbring

Foundations of the Theory of Performance-Based Ranking

Ranking entities such as algorithms, devices, methods, or models based on their performances, while accounting for application-specific preferences, is a challenge. To address this challenge, we establish the foundations of a universal…

Machine Learning · Computer Science 2026-03-25 Sébastien Piérard , Anaïs Halin , Anthony Cioppa , Adrien Deliège , Marc Van Droogenbroeck

Large-scale Benchmarking of Metaphor-based Optimization Heuristics

The number of proposed iterative optimization heuristics is growing steadily, and with this growth, there have been many points of discussion within the wider community. One particular criticism that is raised towards many new algorithms is…

Neural and Evolutionary Computing · Computer Science 2024-02-16 Diederick Vermetten , Carola Doerr , Hao Wang , Anna V. Kononova , Thomas Bäck

Scaling Up Bayesian Neural Networks with Neural Networks

Bayesian Neural Networks (BNNs) offer a principled and natural framework for proper uncertainty quantification in the context of deep learning. They address the typical challenges associated with conventional deep learning methods, such as…

Computation · Statistics 2024-11-13 Zahra Moslemi , Yang Meng , Shiwei Lan , Babak Shahbaba

Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation

Developing large language models is expensive and involves making decisions with small experiments, typically by evaluating on large, multi-task evaluation suites. In this work, we analyze specific properties which make a benchmark more…

Computation and Language · Computer Science 2025-08-19 David Heineman , Valentin Hofmann , Ian Magnusson , Yuling Gu , Noah A. Smith , Hannaneh Hajishirzi , Kyle Lo , Jesse Dodge

Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification

Sampling-based search, a simple paradigm for utilizing test-time compute, involves generating multiple candidate responses and selecting the best one -- typically by having models self-verify each response for correctness. In this paper, we…

Machine Learning · Computer Science 2025-02-21 Eric Zhao , Pranjal Awasthi , Sreenivas Gollapudi

Measuring the performance of sensors that report uncertainty

We provide methods to validate and compare sensor outputs, or inference algorithms applied to sensor data, by adapting statistical scoring rules. The reported output should either be in the form of a prediction interval or of a parameter…

Data Analysis, Statistics and Probability · Physics 2015-07-07 A. D. Martin , T. C. A. Molteno , M. Parry

Finding All Bayesian Network Structures within a Factor of Optimal

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known…

Artificial Intelligence · Computer Science 2018-11-14 Zhenyu A. Liao , Charupriya Sharma , James Cussens , Peter van Beek

Inference for an Algorithmic Fairness-Accuracy Frontier

Algorithms are increasingly used to aid with high-stakes decision making. Yet, their predictive ability frequently exhibits systematic variation across population subgroups. To assess the trade-off between fairness and accuracy using finite…

Econometrics · Economics 2025-06-17 Yiqi Liu , Francesca Molinari

Benchmarking for Bayesian Reinforcement Learning

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been…

Artificial Intelligence · Computer Science 2016-09-28 Michael Castronovo , Damien Ernst , Adrien Couetoux , Raphael Fonteneau

Efficient Benchmarking of Language Models

The increasing versatility of language models (LMs) has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks are associated with massive computational costs, extending to…

Computation and Language · Computer Science 2024-04-02 Yotam Perlitz , Elron Bandel , Ariel Gera , Ofir Arviv , Liat Ein-Dor , Eyal Shnarch , Noam Slonim , Michal Shmueli-Scheuer , Leshem Choshen

Benchmarking Quantum Computers and the Impact of Quantum Noise

Benchmarking is how the performance of a computing system is determined. Surprisingly, even for classical computers this is not a straightforward process. One must choose the appropriate benchmark and metrics to extract meaningful results.…

Quantum Physics · Physics 2021-05-07 Salonik Resch , Ulya R. Karpuzcu

Probabilistic Numerics and Uncertainty in Computations

We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such…

Numerical Analysis · Mathematics 2016-02-17 Philipp Hennig , Michael A Osborne , Mark Girolami