English
Related papers

Related papers: Benchmarking Simulation-Based Inference

200 papers

Uncertainty in optimization is often represented as stochastic parameters in the optimization model. In Predict-Then-Optimize approaches, predictions of a machine learning model are used as values for such parameters, effectively…

Machine Learning · Computer Science 2025-12-03 Pieter Smet

Scientists and engineers employ stochastic numerical simulators to model empirically observed phenomena. In contrast to purely statistical models, simulators express scientific principles that provide powerful inductive biases, improve…

Benchmarks are pivotal in driving AI progress, and invalid benchmark questions frequently undermine their reliability. Manually identifying and correcting errors among thousands of benchmark questions is not only infeasible but also a…

Complex computer simulations are commonly required for accurate data modelling in many scientific disciplines, making statistical inference challenging due to the intractability of the likelihood evaluation for the observed data.…

Machine Learning · Statistics 2019-10-02 Pablo de Castro , Tommaso Dorigo

We present a framework that utilizes quantum algorithms, an architecture aware quantum noise model and an ideal simulator to benchmark quantum computers. The benchmark metrics highlight the difference between the quantum computer evolution…

Quantum Physics · Physics 2021-12-20 Konstantinos Georgopoulos , Clive Emary , Paolo Zuliani

Application Programming Interfaces (APIs), which encapsulate the implementation of specific functions as interfaces, greatly improve the efficiency of modern software development. As numbers of APIs spring up nowadays, developers can hardly…

Software Engineering · Computer Science 2021-12-24 Yun Peng , Shuqing Li , Wenwei Gu , Yichen Li , Wenxuan Wang , Cuiyun Gao , Michael Lyu

The objective comparison of Reinforcement Learning (RL) algorithms is notoriously complex as outcomes and benchmarking of performances of different RL approaches are critically sensitive to environmental design, reward structures, and…

Machine Learning · Computer Science 2026-03-19 Sinan Ibrahim , Grégoire Ouerdane , Hadi Salloum , Henni Ouerdane , Stefan Streif , Pavel Osinenko

In empirical software engineering, benchmarks can be used for comparing different methods, techniques and tools. However, the recent ACM SIGSOFT Empirical Standards for Software Engineering Research do not include an explicit checklist for…

Software Engineering · Computer Science 2021-05-04 Wilhelm Hasselbring

Ranking entities such as algorithms, devices, methods, or models based on their performances, while accounting for application-specific preferences, is a challenge. To address this challenge, we establish the foundations of a universal…

Machine Learning · Computer Science 2026-03-25 Sébastien Piérard , Anaïs Halin , Anthony Cioppa , Adrien Deliège , Marc Van Droogenbroeck

The number of proposed iterative optimization heuristics is growing steadily, and with this growth, there have been many points of discussion within the wider community. One particular criticism that is raised towards many new algorithms is…

Neural and Evolutionary Computing · Computer Science 2024-02-16 Diederick Vermetten , Carola Doerr , Hao Wang , Anna V. Kononova , Thomas Bäck

Bayesian Neural Networks (BNNs) offer a principled and natural framework for proper uncertainty quantification in the context of deep learning. They address the typical challenges associated with conventional deep learning methods, such as…

Computation · Statistics 2024-11-13 Zahra Moslemi , Yang Meng , Shiwei Lan , Babak Shahbaba

Developing large language models is expensive and involves making decisions with small experiments, typically by evaluating on large, multi-task evaluation suites. In this work, we analyze specific properties which make a benchmark more…

Computation and Language · Computer Science 2025-08-19 David Heineman , Valentin Hofmann , Ian Magnusson , Yuling Gu , Noah A. Smith , Hannaneh Hajishirzi , Kyle Lo , Jesse Dodge

Sampling-based search, a simple paradigm for utilizing test-time compute, involves generating multiple candidate responses and selecting the best one -- typically by having models self-verify each response for correctness. In this paper, we…

Machine Learning · Computer Science 2025-02-21 Eric Zhao , Pranjal Awasthi , Sreenivas Gollapudi

We provide methods to validate and compare sensor outputs, or inference algorithms applied to sensor data, by adapting statistical scoring rules. The reported output should either be in the form of a prediction interval or of a parameter…

Data Analysis, Statistics and Probability · Physics 2015-07-07 A. D. Martin , T. C. A. Molteno , M. Parry

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known…

Artificial Intelligence · Computer Science 2018-11-14 Zhenyu A. Liao , Charupriya Sharma , James Cussens , Peter van Beek

Algorithms are increasingly used to aid with high-stakes decision making. Yet, their predictive ability frequently exhibits systematic variation across population subgroups. To assess the trade-off between fairness and accuracy using finite…

Econometrics · Economics 2025-06-17 Yiqi Liu , Francesca Molinari

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been…

Artificial Intelligence · Computer Science 2016-09-28 Michael Castronovo , Damien Ernst , Adrien Couetoux , Raphael Fonteneau

The increasing versatility of language models (LMs) has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks are associated with massive computational costs, extending to…

Computation and Language · Computer Science 2024-04-02 Yotam Perlitz , Elron Bandel , Ariel Gera , Ofir Arviv , Liat Ein-Dor , Eyal Shnarch , Noam Slonim , Michal Shmueli-Scheuer , Leshem Choshen

Benchmarking is how the performance of a computing system is determined. Surprisingly, even for classical computers this is not a straightforward process. One must choose the appropriate benchmark and metrics to extract meaningful results.…

Quantum Physics · Physics 2021-05-07 Salonik Resch , Ulya R. Karpuzcu

We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such…

Numerical Analysis · Mathematics 2016-02-17 Philipp Hennig , Michael A Osborne , Mark Girolami