English
Related papers

Related papers: Benchmarking Simulation-Based Inference

200 papers

Modern language models (LMs) pose a new challenge in capability assessment. Static benchmarks inevitably saturate without providing confidence in the deployment tolerances of LM-based systems, but developers nonetheless claim that their…

Software Engineering · Computer Science 2024-07-31 Michael Saxon , Ari Holtzman , Peter West , William Yang Wang , Naomi Saphra

In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for -- and levies criticisms at -- data and benchmarking practices…

Machine Learning · Computer Science 2024-11-01 Rachel Longjohn , Markelle Kelly , Sameer Singh , Padhraic Smyth

We study an optimization-based approach to construct statistically accurate confidence intervals for simulation performance measures under nonparametric input uncertainty. This approach computes confidence bounds from simulation runs driven…

Methodology · Statistics 2019-02-14 Henry Lam , Huajie Qian

Performance prediction, the task of estimating a system's performance without performing experiments, allows us to reduce the experimental burden caused by the combinatorial explosion of different datasets, languages, tasks, and models. In…

Computation and Language · Computer Science 2021-02-11 Zihuiwen Ye , Pengfei Liu , Jinlan Fu , Graham Neubig

Benchmarking models via classical simulations is one of the main ways to judge ideas in quantum machine learning before noise-free hardware is available. However, the huge impact of the experimental design on the results, the small scales…

Quantum Physics · Physics 2024-03-15 Joseph Bowles , Shahnawaz Ahmed , Maria Schuld

We present extensive empirical evidence showing that current Bayesian simulation-based inference algorithms can produce computationally unfaithful posterior approximations. Our results show that all benchmarked algorithms -- (Sequential)…

Machine Learning · Statistics 2022-12-06 Joeri Hermans , Arnaud Delaunoy , François Rozet , Antoine Wehenkel , Volodimir Begy , Gilles Louppe

Randomized benchmarking (RB) protocols are standard tools for characterizing quantum devices. Prior analyses of RB protocols have not provided a complete method for analyzing realistic data, resulting in a variety of ad-hoc methods. The…

Quantum Physics · Physics 2018-02-02 Ian Hincks , Joel J. Wallman , Chris Ferrie , Chris Granade , David G. Cory

While games have been used extensively as milestones to evaluate game-playing AI, there exists no standardised framework for reporting the obtained observations. As a result, it remains difficult to draw general conclusions about the…

Artificial Intelligence · Computer Science 2020-07-07 Vanessa Volz , Boris Naujoks

Performance optimization of deep learning models is conducted either manually or through automatic architecture search, or a combination of both. On the other hand, their performance strongly depends on the target hardware and how…

Machine Learning · Computer Science 2022-09-23 Vahid Partovi Nia , Alireza Ghaffari , Mahdi Zolnouri , Yvon Savaria

For scientific software, especially those used for large-scale simulations, achieving good performance and efficiently using the available hardware resources is essential. It is important to regularly perform benchmarks to ensure the…

The problem of statistical inference in its various forms has been the subject of decades-long extensive research. Most of the effort has been focused on characterizing the behavior as a function of the number of available samples, with far…

Machine Learning · Computer Science 2024-11-12 Tomer Berg , Or Ordentlich , Ofer Shayevitz

AI models are increasingly deployed in live clinical environments where they must perform reliably across complex, high-stakes workflows that standard training and validation datasets were never designed to capture. Evaluating these systems…

Artificial Intelligence · Computer Science 2026-05-12 Prasanna Desikan , Harshit Rajgarhia , Shivali Dalmia , Ananya Mantravadi

Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms…

Free energy calculations are rapidly becoming indispensable in structure-enabled drug discovery programs. As new methods, force fields, and implementations are developed, assessing their expected accuracy on real-world systems…

Holistic benchmarks for quantum computers are essential for testing and summarizing the performance of quantum hardware. However, holistic benchmarks -- such as algorithmic or randomized benchmarks -- typically do not predict a processor's…

Quantum Physics · Physics 2023-05-16 Daniel Hothem , Jordan Hines , Karthik Nataraj , Robin Blume-Kohout , Timothy Proctor

With the increasing deployment of machine learning models in many socially sensitive tasks, there is a growing demand for reliable and trustworthy predictions. One way to accomplish these requirements is to allow a model to abstain from…

Machine Learning · Computer Science 2024-09-19 Andrea Pugnana , Lorenzo Perini , Jesse Davis , Salvatore Ruggieri

Simulation-based inference has been popular for amortized Bayesian computation. It is typical to have more than one posterior approximation, from different inference algorithms, different architectures, or simply the randomness of…

Methodology · Statistics 2024-03-04 Yuling Yao , Bruno Régaldo-Saint Blancard , Justin Domke

Prompt optimization algorithms for Large Language Models (LLMs) excel in multi-step reasoning but still lack effective uncertainty estimation. This paper introduces a benchmark dataset to evaluate uncertainty metrics, focusing on Answer,…

Machine Learning · Computer Science 2024-12-30 Pei-Fu Guo , Yun-Da Tsai , Shou-De Lin

Benchmarking functionalities in current commercial process mining tools allow organizations to contextualize their process performance through high-level performance indicators, such as completion rate or throughput time. However, they do…

Software Engineering · Computer Science 2025-04-24 Luka Abb , Majid Rafiei , Timotheus Kampik , Jana-Rebecca Rehse

Benchmarking inference performance (speed) of Foundation Models such as Large Language Models (LLM) involves navigating a vast experimental landscape to understand the complex interactions between hardware and software components. However,…

Performance · Computer Science 2025-08-15 Shweta Salaria , Zhuoran Liu , Nelson Mimura Gonzalez
‹ Prev 1 4 5 6 7 8 10 Next ›