Related papers: Benchmarking Simulation-Based Inference

A Literature Survey of Benchmark Functions For Global Optimization Problems

Test functions are important to validate and compare the performance of optimization algorithms. There have been many test or benchmark functions reported in the literature; however, there is no standard list or set of benchmark functions.…

Artificial Intelligence · Computer Science 2013-08-20 Momin Jamil , Xin-She Yang

Online and Interactive Bayesian Inference Debugging

Probabilistic programming is a rapidly developing programming paradigm which enables the formulation of Bayesian models as programs and the automation of posterior inference. It facilitates the development of models and conducting Bayesian…

Software Engineering · Computer Science 2025-10-31 Nathanael Nussbaumer , Markus Böck , Jürgen Cito

MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems

Scaling up data, parameters, and test-time computation has been the mainstream methods to improve LLM systems (LLMsys), but their upper bounds are almost reached due to the gradual depletion of high-quality data and marginal gains obtained…

Machine Learning · Computer Science 2026-05-12 Qingyao Ai , Yichen Tang , Changyue Wang , Jianming Long , Weihang Su , Yiqun Liu

Sensitivity to Serial Dependency of Input Processes: A Robust Approach

Procedures in assessing the impact of serial dependency on performance analysis are usually built on parametrically specified models. In this paper, we propose a robust, nonparametric approach to carry out this assessment, by computing the…

Methodology · Statistics 2016-06-22 Henry Lam

A benchmarking procedure for quantum networks

We propose network benchmarking: a procedure to efficiently benchmark the quality of a quantum network link connecting quantum processors in a quantum network. This procedure is based on the standard randomized benchmarking protocol and…

Quantum Physics · Physics 2021-03-02 Jonas Helsen , Stephanie Wehner

AIBench Scenario: Scenario-distilling AI Benchmarking

Modern real-world application scenarios like Internet services consist of a diversity of AI and non-AI modules with huge code sizes and long and complicated execution paths, which raises serious benchmarking or evaluating challenges. Using…

Performance · Computer Science 2021-09-07 Wanling Gao , Fei Tang , Jianfeng Zhan , Xu Wen , Lei Wang , Zheng Cao , Chuanxin Lan , Chunjie Luo , Xiaoli Liu , Zihan Jiang

Validation of Approximate Likelihood and Emulator Models for Computationally Intensive Simulations

Complex phenomena in engineering and the sciences are often modeled with computationally intensive feed-forward simulations for which a tractable analytic likelihood does not exist. In these cases, it is sometimes necessary to estimate an…

Methodology · Statistics 2020-06-18 Niccolò Dalmasso , Ann B. Lee , Rafael Izbicki , Taylor Pospisil , Ilmun Kim , Chieh-An Lin

Likelihood-Free Inference via Structured Score Matching

In many statistical problems, the data distribution is specified through a generative process for which the likelihood function is analytically intractable, yet inference on the associated model parameters remains of primary interest. We…

Methodology · Statistics 2026-04-01 Haoyu Jiang , Yuexi Wang , Yun Yang

Performance Measurement for Deep Bayesian Neural Network

Deep Bayesian neural network has aroused a great attention in recent years since it combines the benefits of deep neural network and probability theory. Because of this, the network can make predictions and quantify the uncertainty of the…

Machine Learning · Computer Science 2019-03-25 Yikuan Li , Yajie Zhu

How good is good? Probabilistic benchmarks and nanofinance+

Benchmarks are standards that allow to identify opportunities for improvement among comparable units. This study suggests a 2-step methodology for calculating probabilistic benchmarks in noisy data sets: (i) double-hyperbolic undersampling…

Statistical Finance · Quantitative Finance 2021-03-03 Rolando Gonzales Martinez

Data-driven satisficing measure and ranking

We propose an computational framework for real-time risk assessment and prioritizing for random outcomes without prior information on probability distributions. The basic model is built based on satisficing measure (SM) which yields a…

Optimization and Control · Mathematics 2018-07-03 Wenjie Huang

Program Analysis of Probabilistic Programs

Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference…

Programming Languages · Computer Science 2022-04-15 Maria I. Gorinova

Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation

Quantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems. Currently, they shape the direction of AI development and are playing an…

Artificial Intelligence · Computer Science 2025-05-27 Maria Eriksson , Erasmo Purificato , Arman Noroozian , Joao Vinagre , Guillaume Chaslot , Emilia Gomez , David Fernandez-Llorca

Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches

In the NLP community, recent years have seen a surge of research activities that address machines' ability to perform deep language understanding which goes beyond what is explicitly stated in text, rather relying on reasoning and knowledge…

Computation and Language · Computer Science 2020-02-27 Shane Storks , Qiaozi Gao , Joyce Y. Chai

Performance metrics for intervention-triggering prediction models do not reflect an expected reduction in outcomes from using the model

Clinical researchers often select among and evaluate risk prediction models using standard machine learning metrics based on confusion matrices. However, if these models are used to allocate interventions to patients, standard metrics…

Machine Learning · Statistics 2020-06-03 Alejandro Schuler , Aashish Bhardwaj , Vincent Liu

Variational Inference for Nonlinear Inverse Problems via Neural Net Kernels: Comparison to Bayesian Neural Networks, Application to Topology Optimization

Inverse problems and, in particular, inferring unknown or latent parameters from data are ubiquitous in engineering simulations. A predominant viewpoint in identifying unknown parameters is Bayesian inference where both prior information…

Computation · Statistics 2022-08-31 Vahid Keshavarzzadeh , Robert M. Kirby , Akil Narayan

A framework for benchmarking clustering algorithms

The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate…

Machine Learning · Computer Science 2023-10-27 Marek Gagolewski

Statistical Inference for Matching Decisions via Matrix Completion under Dependent Missingness

This paper studies decision-making and statistical inference for two-sided matching markets via matrix completion. In contrast to the independent sampling assumed in classical matrix completion literature, the observed entries, which arise…

Methodology · Statistics 2025-10-31 Congyuan Duan , Wanteng Ma , Dong Xia , Kan Xu

OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics

As models become increasingly sophisticated, conventional algorithm benchmarks are increasingly saturated, underscoring the need for more challenging benchmarks to guide future improvements in algorithmic reasoning. This paper introduces…

Artificial Intelligence · Computer Science 2025-06-13 Yaoming Zhu , Junxin Wang , Yiyang Li , Lin Qiu , ZongYu Wang , Jun Xu , Xuezhi Cao , Yuhuai Wei , Mingshi Wang , Xunliang Cai , Rong Ma

A New Mathematical Model for the Efficiency Calculation

During the past sixty years, a lot of effort has been made regarding the productive efficiency. Such endeavours provided an extensive bibliography on this subject, culminating in two main methods, named the Stochastic Frontier Analysis…

Optimization and Control · Mathematics 2019-08-14 Anibal Galindro , Micael Santos , Delfim F. M. Torres , Ana Marta-Costa