Related papers: Benchmarking Simulation-Based Inference
Benchmark experiments are required to test, compare, tune, and understand optimization algorithms. Ideally, benchmark problems closely reflect real-world problem behavior. Yet, real-world problems are not always readily available for…
Consider a collection of competing machine learning algorithms. Given their performance on a benchmark of datasets, we would like to identify the best performing algorithm. Specifically, which algorithm is most likely to rank highest on a…
Exact structured inference with neural network scoring functions is computationally challenging but several methods have been proposed for approximating inference. One approach is to perform gradient descent with respect to the output…
Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms. However, despite…
Benchmarking optimization algorithms is fundamental for the advancement of computational intelligence. However, widely adopted artificial test suites exhibit limited correspondence with the diversity and complexity of real-world engineering…
Modern computational neuroscience strives to develop complex network models to explain dynamics and function of brains in health and disease. This process goes hand in hand with advancements in the theory of neuronal networks and increasing…
The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in…
Incomplete data are common in practical applications. Most predictive machine learning models do not handle missing values so they require some preprocessing. Although many algorithms are used for data imputation, we do not understand the…
Predictive benchmarking, the evaluation of machine learning models based on predictive performance and competitive ranking, is a central epistemic practice in machine learning research and an increasingly prominent method for scientific…
Causal inference analysis is the estimation of the effects of actions on outcomes. In the context of healthcare data this means estimating the outcome of counter-factual treatments (i.e. including treatments that were not observed) on a…
In order to compare and benchmark the mathematical software, the performance profiles have been introduced [1]. However, it has been proved that the algorithm is not flawless. The main issue with the performance profile is that it may rank…
Benchmarks are a useful tool for empirical performance comparisons. However, one of the main shortcomings of existing benchmarks is that it remains largely unclear how they relate to real-world problems. What does an algorithm's performance…
Simulation models, in particular agent-based models, are gaining popularity in economics. The considerable flexibility they offer, as well as their capacity to reproduce a variety of empirically observed behaviours of complex systems, give…
Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it…
Large language model (LLM) evaluation is increasingly costly, prompting interest in methods that speed up evaluation by shrinking benchmark datasets. Benchmark prediction (also called efficient LLM evaluation) aims to select a small subset…
Running LLMs locally has become increasingly common, but users face a complex design space across models, quantization levels, inference engines, and serving scenarios. Existing inference benchmarks are fragmented and focus on isolated…
Quantum information processing offers promising advances for a wide range of fields and applications, provided that we can efficiently assess the performance of the control applied in candidate systems. That is, we must be able to determine…
AI models are increasingly prevalent in high-stakes environments, necessitating thorough assessment of their capabilities and risks. Benchmarks are popular for measuring these attributes and for comparing model performance, tracking…
Due to the growing adoption of deep neural networks in many fields of science and engineering, modeling and estimating their uncertainties has become of primary importance. Despite the growing literature about uncertainty quantification in…
Benchmarking the performance of quantum optimization algorithms is crucial for identifying utility for industry-relevant use cases. Benchmarking processes vary between optimization applications and depend on user-specified goals. The…