Related papers: Analyzing Probabilistic Methods for Evaluating Age…

Estimating Failure Probability with Neural Operator Hybrid Approach

Evaluating failure probability for complex engineering systems is a computationally intensive task. While the Monte Carlo method is easy to implement, it converges slowly and, hence, requires numerous repeated simulations of a complex…

Computation · Statistics 2023-06-27 Mujing Li , Yani Feng , Guanjie Wang

Monte-Carlo optimizations for resource allocation problems in stochastic network systems

Real-world distributed systems and networks are often unreliable and subject to random failures of its components. Such a stochastic behavior affects adversely the complexity of optimization tasks performed routinely upon such systems, in…

Artificial Intelligence · Computer Science 2012-12-12 Milos Hauskrecht , Tomas Singliar

Agentic Uncertainty Reveals Agentic Overconfidence

Can AI agents predict whether they will succeed at a task? We study agentic uncertainty by eliciting success probability estimates before, during, and after task execution. All results exhibit agentic overconfidence: some agents that…

Artificial Intelligence · Computer Science 2026-02-09 Jean Kaddour , Srijan Patel , Gbètondji Dovonon , Leo Richter , Pasquale Minervini , Matt J. Kusner

Accelerated Monte Carlo estimation of failure probabilities in output of monotone computer codes

The problem of estimating the probability p=P(g(X<0) is considered when X represents a multivariate stochastic input of a monotone function g. First, a heuristic method to bound p is formally described, involving a specialized design of…

Statistics Theory · Mathematics 2015-03-17 Nicolas Bousquet

Approximate Estimation of High-dimension Execution Skill for Dynamic Agents in Continuous Domains

In many real-world continuous action domains, human agents must decide which actions to attempt and then execute those actions to the best of their ability. However, humans cannot execute actions without error. Human performance in these…

Artificial Intelligence · Computer Science 2024-08-21 Delma Nieves-Rivera , Christopher Archibald

Optimal Monte Carlo Estimation of Belief Network Inference

We present two Monte Carlo sampling algorithms for probabilistic inference that guarantee polynomial-time convergence for a larger class of network than current sampling algorithms provide. These new methods are variants of the known…

Artificial Intelligence · Computer Science 2013-02-18 Malcolm Pradhan , Paul Dagum

A subset multicanonical Monte Carlo method for simulating rare failure events

Estimating failure probabilities of engineering systems is an important problem in many engineering fields. In this work we consider such problems where the failure probability is extremely small (e.g $\leq10^{-10}$). In this case, standard…

Numerical Analysis · Mathematics 2017-05-24 Xinjuan Chen , Jinglai Li

Probabilistic programs for inferring the goals of autonomous agents

Intelligent systems sometimes need to infer the probable goals of people, cars, and robots, based on partial observations of their motion. This paper introduces a class of probabilistic programs for formulating and solving these problems.…

Artificial Intelligence · Computer Science 2017-04-19 Marco F. Cusumano-Towner , Alexey Radul , David Wingate , Vikash K. Mansinghka

Measuring Sample Quality with Stein's Method

To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to…

Machine Learning · Statistics 2019-01-03 Jackson Gorham , Lester Mackey

Revisiting Monte Carlo Strength Evaluation

The Monte Carlo method, proposed by Dell'Amico and Filippone, estimates a password's rank within a probabilistic model for password generation, i.e., it determines the password's strength according to this model. We propose several ideas to…

Cryptography and Security · Computer Science 2024-08-02 Martin Stanek

Towards a Science of AI Agent Reliability

AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation…

Artificial Intelligence · Computer Science 2026-02-24 Stephan Rabanser , Sayash Kapoor , Peter Kirgis , Kangheng Liu , Saiteja Utpala , Arvind Narayanan

Regression-adjusted Monte Carlo Estimators for Shapley Values and Probabilistic Values

With origins in game theory, probabilistic values like Shapley values, Banzhaf values, and semi-values have emerged as a central tool in explainable AI. They are used for feature attribution, data attribution, data valuation, and more.…

Machine Learning · Computer Science 2026-01-14 R. Teal Witter , Yurong Liu , Christopher Musco

A Fast Algorithm for Robust Action Selection in Multi-Agent Systems

In this paper, we consider a robust action selection problem in multi-agent systems where performance must be guaranteed when the system suffers a worst-case attack on its agents. Specifically, agents are tasked with selecting actions from…

Multiagent Systems · Computer Science 2022-06-24 Jun Liu , Ryan K. Williams

What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods

A multitude of explainability methods and associated fidelity performance metrics have been proposed to help better understand how modern AI systems make decisions. However, much of the current work has remained theoretical -- without much…

Computer Vision and Pattern Recognition · Computer Science 2023-02-01 Julien Colin , Thomas Fel , Remi Cadene , Thomas Serre

Enhancing the Reliability of Medical AI through Expert-guided Uncertainty Modeling

Artificial intelligence (AI) systems accelerate medical workflows and improve diagnostic accuracy in healthcare, serving as second-opinion systems. However, the unpredictability of AI errors poses a significant challenge, particularly in…

Machine Learning · Computer Science 2026-04-03 Aleksei Khalin , Ekaterina Zaychenkova , Aleksandr Yugay , Andrey Goncharov , Sergey Korchagin , Alexey Zaytsev , Egor Ershov

Is there a half-life for the success rates of AI agents?

Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model -- a…

Artificial Intelligence · Computer Science 2025-05-09 Toby Ord

Overcoming the Incentive Collapse Paradox

AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer…

Machine Learning · Statistics 2026-03-31 Qichuan Yin , Ziwei Su , Shuangning Li

Multifidelity multilevel Monte Carlo to accelerate approximate Bayesian parameter inference for partially observed stochastic processes

Models of stochastic processes are widely used in almost all fields of science. Theory validation, parameter estimation, and prediction all require model calibration and statistical inference using data. However, data are almost always…

Computation · Statistics 2022-09-07 David J. Warne , Thomas P. Prescott , Ruth E. Baker , Matthew J. Simpson

On the Acceleration of the Multi-Level Monte Carlo Method

The multi-level Monte Carlo method proposed by M. Giles (2008) approximates the expectation of some functionals applied to a stochastic process with optimal order of convergence for the mean-square error. In this paper, a modified…

Probability · Mathematics 2023-01-20 Kristian Debrabant , Andreas Rößler

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Benchmarks are essential for quantitatively tracking progress in AI. As AI agents become increasingly capable, researchers and practitioners have introduced agentic benchmarks to evaluate agents on complex, real-world tasks. These…

Artificial Intelligence · Computer Science 2025-08-08 Yuxuan Zhu , Tengjun Jin , Yada Pruksachatkun , Andy Zhang , Shu Liu , Sasha Cui , Sayash Kapoor , Shayne Longpre , Kevin Meng , Rebecca Weiss , Fazl Barez , Rahul Gupta , Jwala Dhamala , Jacob Merizian , Mario Giulianelli , Harry Coppock , Cozmin Ududec , Jasjeet Sekhon , Jacob Steinhardt , Antony Kellermann , Sarah Schwettmann , Matei Zaharia , Ion Stoica , Percy Liang , Daniel Kang