Related papers: Benchmarking Simulation-Based Inference
Uncertainty in optimization is often represented as stochastic parameters in the optimization model. In Predict-Then-Optimize approaches, predictions of a machine learning model are used as values for such parameters, effectively…
Scientists and engineers employ stochastic numerical simulators to model empirically observed phenomena. In contrast to purely statistical models, simulators express scientific principles that provide powerful inductive biases, improve…
Benchmarks are pivotal in driving AI progress, and invalid benchmark questions frequently undermine their reliability. Manually identifying and correcting errors among thousands of benchmark questions is not only infeasible but also a…
Complex computer simulations are commonly required for accurate data modelling in many scientific disciplines, making statistical inference challenging due to the intractability of the likelihood evaluation for the observed data.…
We present a framework that utilizes quantum algorithms, an architecture aware quantum noise model and an ideal simulator to benchmark quantum computers. The benchmark metrics highlight the difference between the quantum computer evolution…
Application Programming Interfaces (APIs), which encapsulate the implementation of specific functions as interfaces, greatly improve the efficiency of modern software development. As numbers of APIs spring up nowadays, developers can hardly…
The objective comparison of Reinforcement Learning (RL) algorithms is notoriously complex as outcomes and benchmarking of performances of different RL approaches are critically sensitive to environmental design, reward structures, and…
In empirical software engineering, benchmarks can be used for comparing different methods, techniques and tools. However, the recent ACM SIGSOFT Empirical Standards for Software Engineering Research do not include an explicit checklist for…
Ranking entities such as algorithms, devices, methods, or models based on their performances, while accounting for application-specific preferences, is a challenge. To address this challenge, we establish the foundations of a universal…
The number of proposed iterative optimization heuristics is growing steadily, and with this growth, there have been many points of discussion within the wider community. One particular criticism that is raised towards many new algorithms is…
Bayesian Neural Networks (BNNs) offer a principled and natural framework for proper uncertainty quantification in the context of deep learning. They address the typical challenges associated with conventional deep learning methods, such as…
Developing large language models is expensive and involves making decisions with small experiments, typically by evaluating on large, multi-task evaluation suites. In this work, we analyze specific properties which make a benchmark more…
Sampling-based search, a simple paradigm for utilizing test-time compute, involves generating multiple candidate responses and selecting the best one -- typically by having models self-verify each response for correctness. In this paper, we…
We provide methods to validate and compare sensor outputs, or inference algorithms applied to sensor data, by adapting statistical scoring rules. The reported output should either be in the form of a prediction interval or of a parameter…
A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known…
Algorithms are increasingly used to aid with high-stakes decision making. Yet, their predictive ability frequently exhibits systematic variation across population subgroups. To assess the trade-off between fairness and accuracy using finite…
In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been…
The increasing versatility of language models (LMs) has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks are associated with massive computational costs, extending to…
Benchmarking is how the performance of a computing system is determined. Surprisingly, even for classical computers this is not a straightforward process. One must choose the appropriate benchmark and metrics to extract meaningful results.…
We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such…