Related papers: Benchmarking Simulation-Based Inference
The best algorithm for a computational problem generally depends on the "relevant inputs," a concept that depends on the application domain and often defies formal articulation. While there is a large literature on empirical approaches to…
Feature-based algorithm selection aims to automatically find the best one from a portfolio of optimization algorithms on an unseen problem based on its landscape features. Feature-based algorithm selection has recently received attention in…
Complex scientific models where the likelihood cannot be evaluated present a challenge for statistical inference. Over the past two decades, a wide range of algorithms have been proposed for learning parameters in computationally feasible…
Insufficient performance of optimization approaches for fitting of mathematical models is still a major bottleneck in systems biology. In this manuscript, the reasons and methodological challenges are summarized as well as their impact in…
Large Language Models (LLMs) have propelled groundbreaking advancements across several domains and are commonly used for text generation applications. However, the computational demands of these complex models pose significant challenges,…
Language model benchmarks are pervasive and computationally-efficient proxies for real-world performance. However, many recent works find that benchmarks often fail to predict real utility. Towards bridging this gap, we introduce benchmark…
As quantum computers grow in size and scope, a question of great importance is how best to benchmark performance. Here we define a set of characteristics that any benchmark should follow -- randomized, well-defined, holistic, device…
We present a benchmark to facilitate simulated manipulation; an attempt to overcome the obstacles of physical benchmarks through the distribution of a real world, ground truth dataset. Users are given various simulated manipulation tasks…
Inference of the network structure (e.g., routing topology) and dynamics (e.g., link performance) is an essential component in many network design and management tasks. In this paper we propose a new, general framework for analyzing and…
We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on static, hand-curated sets of contest or textbook-style problems as proxies for…
Decision making from data involves identifying a set of attributes that contribute to effective decision making through computational intelligence. The presence of missing values greatly influences the selection of right set of attributes…
In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different…
In small area estimation, it is sometimes necessary to use model-based methods to produce estimates in areas with little or no data. In official statistics, we often require that some aggregate of small area estimates agree with a national…
Nonlinear system identification remains an important open challenge across research and academia. Large numbers of novel approaches are seen published each year, each presenting improvements or extensions to existing methods. It is natural,…
Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the…
Quantum processors are now able to run quantum circuits that are infeasible to simulate classically, creating a need for benchmarks that assess a quantum processor's rate of errors when running these circuits. Here, we introduce a general…
Missing values pose a persistent challenge in modern data science. Consequently, there is an ever-growing number of publications introducing new imputation methods in various fields. While many studies compare imputation approaches, they…
Empirical and LLM-based research in model-driven engineering increasingly relies on datasets of software models, for instance, to train or evaluate machine learning techniques for modeling support. These datasets have a significant impact…
Benchmarks are used for testing new optimization algorithms and their variants to evaluate their performance. Most existing benchmarks are smooth functions. This chapter introduces ten new benchmarks with different properties, including…
Likelihood-free Bayesian inference algorithms are popular methods for calibrating the parameters of complex, stochastic models, required when the likelihood of the observed data is intractable. These algorithms characteristically rely…