Related papers: Variable-Based Calibration for Machine Learning Cl…

Estimating Expected Calibration Errors

Uncertainty in probabilistic classifiers predictions is a key concern when models are used to support human decision making, in broader probabilistic pipelines or when sensitive automatic decisions have to be taken. Studies have shown that…

Machine Learning · Computer Science 2021-09-09 Nicolas Posocco , Antoine Bonnefoy

Extending confidence calibration to generalised measures of variation

We propose the Variation Calibration Error (VCE) metric for assessing the calibration of machine learning classifiers. The metric can be viewed as an extension of the well-known Expected Calibration Error (ECE) which assesses the…

Machine Learning · Computer Science 2026-02-16 Andrew Thompson , Vivek Desai

Understanding Model Calibration -- A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)

To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blogpost we'll take a look at the most commonly used definition for calibration and then dive into a…

Methodology · Statistics 2025-09-16 Maja Pavlovic

What is Your Metric Telling You? Evaluating Classifier Calibration under Context-Specific Definitions of Reliability

Classifier calibration has received recent attention from the machine learning community due both to its practical utility in facilitating decision making, as well as the observation that modern neural network classifiers are poorly…

Machine Learning · Computer Science 2022-05-24 John Kirchenbauer , Jacob Oaks , Eric Heim

Local Calibration: Metrics and Recalibration

Probabilistic classifiers output confidence scores along with their predictions, and these confidence scores should be calibrated, i.e., they should reflect the reliability of the prediction. Confidence scores that minimize standard metrics…

Machine Learning · Computer Science 2022-08-22 Rachel Luo , Aadyot Bhatnagar , Yu Bai , Shengjia Zhao , Huan Wang , Caiming Xiong , Silvio Savarese , Stefano Ermon , Edward Schmerling , Marco Pavone

Evaluating the Quality of the Quantified Uncertainty for (Re)Calibration of Data-Driven Regression Models

In safety-critical applications data-driven models must not only be accurate but also provide reliable uncertainty estimates. This property, commonly referred to as calibration, is essential for risk-aware decision-making. In regression a…

Machine Learning · Computer Science 2026-04-23 Jelke Wibbeke , Nico Schönfisch , Sebastian Rohjans , Andreas Rauh

Reassessing How to Compare and Improve the Calibration of Machine Learning Models

A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction. This property has become increasingly important as the impact of machine…

Machine Learning · Computer Science 2025-02-25 Muthu Chidambaram , Rong Ge

Properties of the ENCE and other MAD-based calibration metrics

The Expected Normalized Calibration Error (ENCE) is a popular calibration statistic used in Machine Learning to assess the quality of prediction uncertainties for regression problems. Estimation of the ENCE is based on the binning of…

Machine Learning · Computer Science 2023-05-23 Pascal Pernot

Estimating calibration error under label shift without labels

In the face of dataset shift, model calibration plays a pivotal role in ensuring the reliability of machine learning systems. Calibration error (CE) is an indicator of the alignment between the predicted probabilities and the classifier…

Machine Learning · Computer Science 2023-12-15 Teodora Popordanoska , Gorjan Radevski , Tinne Tuytelaars , Matthew B. Blaschko

On the Usefulness of the Fit-on-the-Test View on Evaluating Calibration of Classifiers

Every uncalibrated classifier has a corresponding true calibration map that calibrates its confidence. Deviations of this idealistic map from the identity map reveal miscalibration. Such calibration errors can be reduced with many post-hoc…

Machine Learning · Computer Science 2025-02-27 Markus Kängsepp , Kaspar Valk , Meelis Kull

Better Uncertainty Calibration via Proper Scores for Classification and Beyond

With model trustworthiness being crucial for sensitive real-world applications, practitioners are putting more and more focus on improving the uncertainty calibration of deep neural networks. Calibration errors are designed to quantify the…

Machine Learning · Computer Science 2024-03-14 Sebastian G. Gruber , Florian Buettner

Evaluating Posterior Probabilities: Decision Theory, Proper Scoring Rules, and Calibration

Most machine learning classifiers are designed to output posterior probabilities for the classes given the input sample. These probabilities may be used to make the categorical decision on the class of the sample; provided as input to a…

Machine Learning · Statistics 2024-08-07 Luciana Ferrer , Daniel Ramos

A Confidence Interval for the $\ell_2$ Expected Calibration Error

Recent advances in machine learning have significantly improved prediction accuracy in various applications. However, ensuring the calibration of probabilistic predictions remains a significant challenge. Despite efforts to enhance model…

Machine Learning · Statistics 2025-08-05 Yan Sun , Pratik Chaudhari , Ian J. Barnett , Edgar Dobriban

Clustered Calibration: Representation-Aware Probability Calibration via Learned Subpopulations

Ensuring that predicted probabilities align with observed frequencies is critical in high-stakes domains such as clinical decision support, autonomous driving and financial risk assessment. Existing calibration methods typically apply a…

Machine Learning · Computer Science 2026-05-26 Tomer Lavi , Bracha Shapira , Nadav Rappoport

On Uncertainty Calibration for Equivariant Functions

Data-sparse settings such as robotic manipulation, molecular physics, and galaxy morphology classification are some of the hardest domains for deep learning. For these problems, equivariant networks can help improve modeling across…

Machine Learning · Computer Science 2026-01-30 Edward Berman , Jacob Ginesin , Marco Pacini , Robin Walters

Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference

While in-context learning with large language models (LLMs) has shown impressive performance, we have discovered a unique miscalibration behavior where both correct and incorrect predictions are assigned the same level of confidence. We…

Computation and Language · Computer Science 2024-10-04 Wei Cheng , Tianlu Wang , Yanmin Ji , Fan Yang , Keren Tan , Yiyu Zheng

Calibration of Model Uncertainty for Dropout Variational Inference

The model uncertainty obtained by variational Bayesian inference with Monte Carlo dropout is prone to miscalibration. In this paper, different logit scaling methods are extended to dropout variational inference to recalibrate model…

Machine Learning · Computer Science 2020-06-23 Max-Heinrich Laves , Sontje Ihler , Karl-Philipp Kortmann , Tobias Ortmaier

Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models

Machine unlearning aims to remove the influence of specific training data from a model while preserving reliable behavior on the remaining data, making reliable prediction and uncertainty estimation essential for evaluation. Calibration is…

Computation and Language · Computer Science 2026-05-21 Divyaksh Shukla , Ashutosh Modi

A Variational Estimator for $L_p$ Calibration Errors

Calibration$\unicode{x2014}$the problem of ensuring that predicted probabilities align with observed class frequencies$\unicode{x2014}$is a basic desideratum for reliable prediction with machine learning systems. Calibration error is…

Machine Learning · Statistics 2026-03-02 Eugène Berta , Sacha Braun , David Holzmüller , Francis Bach , Michael I. Jordan

Measuring Calibration in Deep Learning

Overconfidence and underconfidence in machine learning classifiers is measured by calibration: the degree to which the probabilities predicted for each class match the accuracy of the classifier on that prediction. How one measures…

Machine Learning · Computer Science 2020-08-11 Jeremy Nixon , Mike Dusenberry , Ghassen Jerfel , Timothy Nguyen , Jeremiah Liu , Linchuan Zhang , Dustin Tran