Related papers: Fault-Tolerant Evaluation for Sample-Efficient Mod…

Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework

Commonly, AI or machine learning (ML) models are evaluated on benchmark datasets. This practice supports innovative methodological research, but benchmark performance can be poorly correlated with performance in real-world applications -- a…

Machine Learning · Computer Science 2024-06-18 Olivier Binette , Jerome P. Reiter

On the Properties of Simulation-based Estimators in High Dimensions

Considering the increasing size of available data, the need for statistical methods that control the finite sample bias is growing. This is mainly due to the frequent settings where the number of variables is large and allowed to increase…

Statistics Theory · Mathematics 2018-10-12 Stéphane Guerrier , Mucyo Karemera , Samuel Orso , Maria-Pia Victoria-Feser

Enhancing Performance of Explainable AI Models with Constrained Concept Refinement

The trade-off between accuracy and interpretability has long been a challenge in machine learning (ML). This tension is particularly significant for emerging interpretable-by-design methods, which aim to redesign ML algorithms for…

Machine Learning · Computer Science 2025-05-28 Geyu Liang , Senne Michielssen , Salar Fattahi

Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection

The evaluation of supervised machine learning models is a critical stage in the development of reliable predictive systems. Despite the widespread availability of machine learning libraries and automated workflows, model assessment is often…

Machine Learning · Computer Science 2026-04-16 Xuanyan Liu , Ignacio Cabrera Martin , Marcello Trovati , Xiaolong Xu , Nikolaos Polatidis

Beyond Point Estimates: Distributional Uncertainty in Machine Learning Performance Evaluation

Machine learning models are often evaluated using point estimates of performance metrics such as accuracy, F1 score, or mean squared error. Such summaries fail to capture the inherent variability induced by stochastic elements of the…

Machine Learning · Computer Science 2026-05-13 Christoph Lehmann , Yahor Paromau

Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees

Selecting artificial intelligence (AI) models, such as large language models (LLMs), from multiple candidates requires accurate performance estimation. This is ideally achieved through empirical evaluations involving abundant real-world…

Machine Learning · Statistics 2025-12-03 Sangwoo Park , Matteo Zecchin , Osvaldo Simeone

On Misbehaviour and Fault Tolerance in Machine Learning Systems

Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new situations and contexts. At the same time, this adaptability raises uncertainties concerning the run-time product quality or dependability,…

Software Engineering · Computer Science 2022-10-18 Lalli Myllyaho , Mikko Raatikainen , Tomi Männistö , Jukka K. Nurminen , Tommi Mikkonen

Developing a Dataset-Adaptive, Normalized Metric for Machine Learning Model Assessment: Integrating Size, Complexity, and Class Imbalance

Traditional metrics like accuracy, F1-score, and precision are frequently used to evaluate machine learning models, however they may not be sufficient for evaluating performance on tiny, unbalanced, or high-dimensional datasets. A…

Machine Learning · Computer Science 2024-12-11 Serzhan Ossenov

Evaluating Model Robustness and Stability to Dataset Shift

As the use of machine learning in high impact domains becomes widespread, the importance of evaluating safety has increased. An important aspect of this is evaluating how robust a model is to changes in setting or population, which…

Machine Learning · Computer Science 2021-03-16 Adarsh Subbaswamy , Roy Adams , Suchi Saria

Active Testing: Sample-Efficient Model Evaluation

We introduce a new framework for sample-efficient model evaluation that we call active testing. While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of…

Machine Learning · Statistics 2021-06-15 Jannik Kossen , Sebastian Farquhar , Yarin Gal , Tom Rainforth

Model Agnostic Explainable Selective Regression via Uncertainty Estimation

With the wide adoption of machine learning techniques, requirements have evolved beyond sheer high performance, often requiring models to be trustworthy. A common approach to increase the trustworthiness of such systems is to allow them to…

Machine Learning · Computer Science 2023-11-16 Andrea Pugnana , Carlos Mougan , Dan Saattrup Nielsen

Model selection for estimation of causal parameters

A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. In causal inference, the optimal choice of estimator…

Methodology · Statistics 2021-07-07 Dominik Rothenhäusler

Adaptive debiased machine learning using data-driven model selection techniques

Debiased machine learning estimators for smooth functionals in nonparametric models can exhibit substantial variability and instability, often leading practitioners to instead rely on parametric or semiparametric working models. Such…

Methodology · Statistics 2026-03-20 Lars van der Laan , Marco Carone , Alex Luedtke , Mark van der Laan

Towards Stochastic Fault-tolerant Control using Precision Learning and Active Inference

This work presents a fault-tolerant control scheme for sensory faults in robotic manipulators based on active inference. In the majority of existing schemes, a binary decision of whether a sensor is healthy (functional) or faulty is made…

Robotics · Computer Science 2022-03-30 Mohamed Baioumy , Corrado Pezzato , Carlos Hernandez Corbato , Nick Hawes , Riccardo Ferrari

Variable selection using pseudo-variables

Penalized regression has become a standard tool for model building across a wide range of application domains. Common practice is to tune the amount of penalization to tradeoff bias and variance or to optimize some other measure of…

Methodology · Statistics 2018-04-05 Wenhao Hu , Eric Laber , Leonard Stefanski

How to Correctly Report LLM-as-a-Judge Evaluations

Large language models (LLMs) are widely used as scalable evaluators of model responses in lieu of human annotators. However, imperfect sensitivity and specificity of the LLM judges induce bias in naive evaluation scores. We propose a simple…

Machine Learning · Computer Science 2026-02-10 Chungpa Lee , Thomas Zeng , Jongwon Jeong , Jy-yong Sohn , Kangwook Lee

Active Testing: An Efficient and Robust Framework for Estimating Accuracy

Much recent work on visual recognition aims to scale up learning to massive, noisily-annotated datasets. We address the problem of scaling- up the evaluation of such models to large-scale datasets with noisy labels. Current protocols for…

Computer Vision and Pattern Recognition · Computer Science 2018-07-03 Phuc Nguyen , Deva Ramanan , Charless Fowlkes

Streamlined Framework for Agile Forecasting Model Development towards Efficient Inventory Management

This paper proposes a framework for developing forecasting models by streamlining the connections between core components of the developmental process. The proposed framework enables swift and robust integration of new datasets,…

Machine Learning · Computer Science 2023-04-14 Jonathan Hans Soeseno , Sergio González , Trista Pei-Chun Chen

Toward Better Practice of Covariate Adjustment in Analyzing Randomized Clinical Trials

In randomized clinical trials, adjustments for baseline covariates at both design and analysis stages are highly encouraged by regulatory agencies. A recent trend is to use a model-assisted approach for covariate adjustment to gain…

Methodology · Statistics 2021-07-14 Ting Ye , Jun Shao , Yanyao Yi , Qingyuan Zhao

Adaptive Fault Tolerance Mechanisms of Large Language Models in Cloud Computing Environments

With the rapid evolution of Large Language Models (LLMs) and their large-scale experimentation in cloud-computing spaces, the challenge of guaranteeing their security and efficiency in a failure scenario has become a main issue. To ensure…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-18 Yihong Jin , Ze Yang , Xinhe Xu , Yihan Zhang , Shuyang Ji