Related papers: Benchmarking Simulation-Based Inference

Automatic Reparameterisation of Probabilistic Programs

Probabilistic programming has emerged as a powerful paradigm in statistics, applied science, and machine learning: by decoupling modelling from inference, it promises to allow modellers to directly reason about the processes generating…

Machine Learning · Statistics 2019-06-10 Maria I. Gorinova , Dave Moore , Matthew D. Hoffman

URSABench: Comprehensive Benchmarking of Approximate Bayesian Inference Methods for Deep Neural Networks

While deep learning methods continue to improve in predictive accuracy on a wide range of application domains, significant issues remain with other aspects of their performance including their ability to quantify uncertainty and their…

Machine Learning · Computer Science 2020-07-10 Meet P. Vadera , Adam D. Cobb , Brian Jalaian , Benjamin M. Marlin

A critical analysis of metrics used for measuring progress in artificial intelligence

Comparing model performances on benchmark datasets is an integral part of measuring and driving progress in artificial intelligence. A model's performance on a benchmark dataset is commonly assessed based on a single or a small set of…

Artificial Intelligence · Computer Science 2021-11-09 Kathrin Blagec , Georg Dorffner , Milad Moradi , Matthias Samwald

Efficient Benchmarking Is Just Feature Selection and Multiple Regression

Efficient benchmarking techniques aim to lower the computational cost of evaluating LLMs by predicting full benchmark scores using only a subset of a benchmark's questions. By reframing this problem as an instance of multiple regression…

Machine Learning · Statistics 2026-05-26 Sam Bowyer , Acyr Locatelli , Kris Cao

Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models

Our paper deals with inferring simulator-based statistical models given some observed data. A simulator-based model is a parametrized mechanism which specifies how data are generated. It is thus also referred to as generative model. We…

Machine Learning · Statistics 2016-01-01 Michael U. Gutmann , Jukka Corander

FairlyUncertain: A Comprehensive Benchmark of Uncertainty in Algorithmic Fairness

Fair predictive algorithms hinge on both equality and trust, yet inherent uncertainty in real-world data challenges our ability to make consistent, fair, and calibrated decisions. While fairly managing predictive error has been extensively…

Machine Learning · Computer Science 2024-10-04 Lucas Rosenblatt , R. Teal Witter

A User's Guide to Calibrating Robotics Simulators

Simulators are a critical component of modern robotics research. Strategies for both perception and decision making can be studied in simulation first before deployed to real world systems, saving on time and costs. Despite significant…

Machine Learning · Computer Science 2020-11-19 Bhairav Mehta , Ankur Handa , Dieter Fox , Fabio Ramos

Simulation-based optimal Bayesian experimental design for nonlinear systems

The optimal selection of experimental conditions is essential to maximizing the value of data for inference and prediction, particularly in situations where experiments are time-consuming and expensive to conduct. We propose a general…

Machine Learning · Statistics 2012-12-04 Xun Huan , Youssef M. Marzouk

A survey of benchmarking frameworks for reinforcement learning

Reinforcement learning has recently experienced increased prominence in the machine learning community. There are many approaches to solving reinforcement learning problems with new techniques developed constantly. When solving problems…

Machine Learning · Computer Science 2020-12-14 Belinda Stapelberg , Katherine M. Malan

A Framework for Statistical Inference via Randomized Algorithms

Randomized algorithms, such as randomized sketching or stochastic optimization, are a promising approach to ease the computational burden in analyzing large datasets. However, randomized algorithms also produce non-deterministic outputs,…

Methodology · Statistics 2025-05-13 Zhixiang Zhang , Sokbae Lee , Edgar Dobriban

CausalBench: A Large-scale Benchmark for Network Inference from Single-cell Perturbation Data

Causal inference is a vital aspect of multiple scientific disciplines and is routinely applied to high-impact applications such as medicine. However, evaluating the performance of causal inference methods in real-world environments is…

Machine Learning · Computer Science 2023-07-04 Mathieu Chevalley , Yusuf Roohani , Arash Mehrjou , Jure Leskovec , Patrick Schwab

Benchmark-Driven Selection of AI: Evidence from DeepSeek-R1

Evaluation of reasoning language models gained importance after it was observed that they can combine their existing capabilities into novel traces of intermediate steps before task completion and that the traces can sometimes help them to…

Machine Learning · Computer Science 2025-08-15 Petr Spelda , Vit Stritecky

Bridging Theory and Practice: Statistical Inference for Latent Space Models of Networks

Latent space models have been widely adopted in modeling network data. Developing statistical inference for estimated model parameters enables quantifying associated uncertainty and is pivotal for downstream tasks. Despite recent progress…

Statistics Theory · Mathematics 2026-05-12 Yuang Tian , Jiajin Sun , Yinqiu He

Benchmark Design and Prior-independent Optimization

This paper compares two leading approaches for robust optimization in the models of online algorithms and mechanism design. Competitive analysis compares the performance of an online algorithm to an offline benchmark in worst-case over…

Computer Science and Game Theory · Computer Science 2020-09-09 Jason Hartline , Aleck Johnsen , Yingkai Li

Block-Bench: A Framework for Controllable and Transparent Discrete Optimization Benchmarking

We present a novel approach for constructing discrete optimization benchmarks that enables fine-grained control over problem properties, and such benchmarks can facilitate analyzing discrete algorithm behaviors. We build benchmark problems…

Neural and Evolutionary Computing · Computer Science 2026-04-09 Furong Ye , Frank Neumann , Thomas Bäck , Niki van Stein

Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress?

Network embedding methods map a network's nodes to vectors in an embedding space, in such a way that these representations are useful for estimating some notion of similarity or proximity between pairs of nodes in the network. The quality…

Social and Information Networks · Computer Science 2022-02-02 Alexandru Mara , Jefrey Lijffijt , Tijl De Bie

A Novel Ranking Scheme for the Performance Analysis of Stochastic Optimization Algorithms using the Principles of Severity

Stochastic optimization algorithms have been successfully applied in several domains to find optimal solutions. Because of the ever-growing complexity of the integrated systems, novel stochastic algorithms are being proposed, which makes…

Artificial Intelligence · Computer Science 2024-06-04 Sowmya Chandrasekaran , Thomas Bartz-Beielstein

What are the best systems? New perspectives on NLP Benchmarking

In Machine Learning, a benchmark refers to an ensemble of datasets associated with one or multiple metrics together with a way to aggregate different systems performances. They are instrumental in (i) assessing the progress of new methods…

Computation and Language · Computer Science 2022-10-10 Pierre Colombo , Nathan Noiry , Ekhine Irurozki , Stephan Clemencon

Online Statistical Inference in Decision-Making with Matrix Context

The study of online decision-making problems that leverage contextual information has drawn notable attention due to their significant applications in fields ranging from healthcare to autonomous systems. In modern applications, contextual…

Machine Learning · Statistics 2025-04-22 Qiyu Han , Will Wei Sun , Yichen Zhang

Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks

The ability to train Deep Neural Networks (DNNs) with constraints is instrumental in improving the fairness of modern machine-learning models. Many algorithms have been analysed in recent years, and yet there is no standard, widely accepted…

Machine Learning · Computer Science 2026-02-19 Andrii Kliachkin , Jana Lepšová , Gilles Bareilles , Jakub Mareček