Related papers: Benchmarking Simulation-Based Inference

The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong

Algorithm selection, aiming to identify the best algorithm for a given problem, plays a pivotal role in continuous black-box optimization. A common approach involves representing optimization functions using a set of features, which are…

Machine Learning · Computer Science 2025-05-13 Gašper Petelin , Gjorgjina Cenikj

Application-Oriented Performance Benchmarks for Quantum Computing

In this work we introduce an open source suite of quantum application-oriented performance benchmarks that is designed to measure the effectiveness of quantum computing hardware at executing quantum applications. These benchmarks probe a…

Quantum Physics · Physics 2025-04-15 Thomas Lubinski , Sonika Johri , Paul Varosy , Jeremiah Coleman , Luning Zhao , Jason Necaise , Charles H. Baldwin , Karl Mayer , Timothy Proctor

A Survey of Parameters Associated with the Quality of Benchmarks in NLP

Several benchmarks have been built with heavy investment in resources to track our progress in NLP. Thousands of papers published in response to those benchmarks have competed to top leaderboards, with models often surpassing human…

Computation and Language · Computer Science 2022-10-17 Swaroop Mishra , Anjana Arunkumar , Chris Bryan , Chitta Baral

Data-driven Ranking and Selection under Input Uncertainty

We consider a simulation-based Ranking and Selection (R&S) problem with input uncertainty, where unknown input distributions can be estimated using input data arriving in batches of varying sizes over time. Each time a batch arrives,…

Optimization and Control · Mathematics 2022-09-05 Di Wu , Yuhao Wang , Enlu Zhou

An Extensible Benchmark Suite for Learning to Simulate Physical Systems

Simulating physical systems is a core component of scientific computing, encompassing a wide range of physical domains and applications. Recently, there has been a surge in data-driven methods to complement traditional numerical simulations…

Machine Learning · Computer Science 2021-08-19 Karl Otness , Arvi Gjoka , Joan Bruna , Daniele Panozzo , Benjamin Peherstorfer , Teseo Schneider , Denis Zorin

Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment

Inference-time computation offers a powerful axis for scaling the performance of language models. However, naively increasing computation in techniques like Best-of-N sampling can lead to performance degradation due to reward hacking.…

Artificial Intelligence · Computer Science 2025-04-09 Audrey Huang , Adam Block , Qinghua Liu , Nan Jiang , Akshay Krishnamurthy , Dylan J. Foster

Deep Learning Methods for Proximal Inference via Maximum Moment Restriction

The No Unmeasured Confounding Assumption is widely used to identify causal effects in observational studies. Recent work on proximal inference has provided alternative identification results that succeed even in the presence of unobserved…

Machine Learning · Statistics 2022-10-17 Benjamin Kompa , David R. Bellamy , Thomas Kolokotrones , James M. Robins , Andrew L. Beam

Evaluating the Performance of Reinforcement Learning Algorithms

Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning. Recent reproducibility analyses have shown that reported performance results are often inconsistent and difficult to replicate. In this…

Machine Learning · Computer Science 2020-08-14 Scott M. Jordan , Yash Chandak , Daniel Cohen , Mengxue Zhang , Philip S. Thomas

CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation

Many benchmarks for automated causal inference evaluate a system's performance based on a single numerical output, such as an Average Treatment Effect (ATE). This approach conflates two distinct steps in causal analysis: identification -…

Artificial Intelligence · Computer Science 2026-05-15 Ayush Sawarni , Jiyuan Tan , Vasilis Syrgkanis

How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers

Many optimizers have been proposed for training deep neural networks, and they often have multiple hyperparameters, which make it tricky to benchmark their performance. In this work, we propose a new benchmarking protocol to evaluate both…

Machine Learning · Computer Science 2020-10-21 Yuanhao Xiong , Xuanqing Liu , Li-Cheng Lan , Yang You , Si Si , Cho-Jui Hsieh

Benchmarking the Operation of Quantum Heuristics and Ising Machines: Scoring Parameter Setting Strategies on Optimization Applications

We discuss guidelines for evaluating the performance of parameterized stochastic solvers for optimization problems, with particular attention to systems that employ novel hardware, such as digital quantum processors running variational…

Quantum Physics · Physics 2024-02-19 David E. Bernal Neira , Robin Brown , Pratik Sathe , Filip Wudarski , Marco Pavone , Eleanor G. Rieffel , Davide Venturelli

Fairness Metrics: A Comparative Analysis

Algorithmic fairness is receiving significant attention in the academic and broader literature due to the increasing use of predictive algorithms, including those based on artificial intelligence. One benefit of this trend is that algorithm…

Computers and Society · Computer Science 2020-01-28 Pratyush Garg , John Villasenor , Virginia Foggo

Exploiting the Statistics of Learning and Inference

When dealing with datasets containing a billion instances or with simulations that require a supercomputer to execute, computational resources become part of the equation. We can improve the efficiency of learning and inference by…

Machine Learning · Computer Science 2014-03-06 Max Welling

Towards Comprehensive Benchmarking Infrastructure for LLMs In Software Engineering

Large language models for code are advancing fast, yet our ability to evaluate them lags behind. Current benchmarks focus on narrow tasks and single metrics, which hide critical gaps in robustness, interpretability, fairness, efficiency,…

Software Engineering · Computer Science 2026-01-30 Daniel Rodriguez-Cardenas , Xiaochang Li , Marcos Macedo , Antonio Mastropaolo , Dipin Khati , Yuan Tian , Huajie Shao , Denys Poshyvanyk

Spintronics based Stochastic Computing for Efficient Bayesian Inference System

Bayesian inference is an effective approach for solving statistical learning problems especially with uncertainty and incompleteness. However, inference efficiencies are physically limited by the bottlenecks of conventional computing…

Emerging Technologies · Computer Science 2017-11-06 Xiaotao Jia , Jianlei Yang , Zhaohao Wang , Yiran Chen , Hai , Li , Weisheng Zhao

Variational Inference for Bayesian Neural Networks under Model and Parameter Uncertainty

Bayesian neural networks (BNNs) have recently regained a significant amount of attention in the deep learning community due to the development of scalable approximate Bayesian inference techniques. There are several advantages of using a…

Machine Learning · Statistics 2023-05-02 Aliaksandr Hubin , Geir Storvik

PLANET: A Collection of Benchmarks for Evaluating LLMs' Planning Capabilities

Planning is central to agents and agentic AI. The ability to plan, e.g., creating travel itineraries within a budget, holds immense potential in both scientific and commercial contexts. Moreover, optimal plans tend to require fewer…

Artificial Intelligence · Computer Science 2025-04-22 Haoming Li , Zhaoliang Chen , Jonathan Zhang , Fei Liu

Contrastive Neural Ratio Estimation for Simulation-based Inference

Likelihood-to-evidence ratio estimation is usually cast as either a binary (NRE-A) or a multiclass (NRE-B) classification task. In contrast to the binary classification framework, the current formulation of the multiclass version has an…

Machine Learning · Statistics 2024-07-08 Benjamin Kurt Miller , Christoph Weniger , Patrick Forré

Post-hoc Models for Performance Estimation of Machine Learning Inference

Estimating how well a machine learning model performs during inference is critical in a variety of scenarios (for example, to quantify uncertainty, or to choose from a library of available models). However, the standard accuracy estimate of…

Computer Vision and Pattern Recognition · Computer Science 2021-10-07 Xuechen Zhang , Samet Oymak , Jiasi Chen

Learning All Credible Bayesian Network Structures for Model Averaging

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known…

Artificial Intelligence · Computer Science 2020-09-01 Zhenyu A. Liao , Charupriya Sharma , James Cussens , Peter van Beek