Related papers: Analyzing Probabilistic Methods for Evaluating Age…
Evaluating failure probability for complex engineering systems is a computationally intensive task. While the Monte Carlo method is easy to implement, it converges slowly and, hence, requires numerous repeated simulations of a complex…
Real-world distributed systems and networks are often unreliable and subject to random failures of its components. Such a stochastic behavior affects adversely the complexity of optimization tasks performed routinely upon such systems, in…
Can AI agents predict whether they will succeed at a task? We study agentic uncertainty by eliciting success probability estimates before, during, and after task execution. All results exhibit agentic overconfidence: some agents that…
The problem of estimating the probability p=P(g(X<0) is considered when X represents a multivariate stochastic input of a monotone function g. First, a heuristic method to bound p is formally described, involving a specialized design of…
In many real-world continuous action domains, human agents must decide which actions to attempt and then execute those actions to the best of their ability. However, humans cannot execute actions without error. Human performance in these…
We present two Monte Carlo sampling algorithms for probabilistic inference that guarantee polynomial-time convergence for a larger class of network than current sampling algorithms provide. These new methods are variants of the known…
Estimating failure probabilities of engineering systems is an important problem in many engineering fields. In this work we consider such problems where the failure probability is extremely small (e.g $\leq10^{-10}$). In this case, standard…
Intelligent systems sometimes need to infer the probable goals of people, cars, and robots, based on partial observations of their motion. This paper introduces a class of probabilistic programs for formulating and solving these problems.…
To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to…
The Monte Carlo method, proposed by Dell'Amico and Filippone, estimates a password's rank within a probabilistic model for password generation, i.e., it determines the password's strength according to this model. We propose several ideas to…
AI agents are increasingly deployed to execute important tasks. While rising accuracy scores on standard benchmarks suggest rapid progress, many agents still continue to fail in practice. This discrepancy highlights a fundamental limitation…
With origins in game theory, probabilistic values like Shapley values, Banzhaf values, and semi-values have emerged as a central tool in explainable AI. They are used for feature attribution, data attribution, data valuation, and more.…
In this paper, we consider a robust action selection problem in multi-agent systems where performance must be guaranteed when the system suffers a worst-case attack on its agents. Specifically, agents are tasked with selecting actions from…
A multitude of explainability methods and associated fidelity performance metrics have been proposed to help better understand how modern AI systems make decisions. However, much of the current work has remained theoretical -- without much…
Artificial intelligence (AI) systems accelerate medical workflows and improve diagnostic accuracy in healthcare, serving as second-opinion systems. However, the unpredictability of AI errors poses a significant challenge, particularly in…
Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model -- a…
AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer…
Models of stochastic processes are widely used in almost all fields of science. Theory validation, parameter estimation, and prediction all require model calibration and statistical inference using data. However, data are almost always…
The multi-level Monte Carlo method proposed by M. Giles (2008) approximates the expectation of some functionals applied to a stochastic process with optimal order of convergence for the mean-square error. In this paper, a modified…
Benchmarks are essential for quantitatively tracking progress in AI. As AI agents become increasingly capable, researchers and practitioners have introduced agentic benchmarks to evaluate agents on complex, real-world tasks. These…