Related papers: Estimating Expected Calibration Errors

What is Your Metric Telling You? Evaluating Classifier Calibration under Context-Specific Definitions of Reliability

Classifier calibration has received recent attention from the machine learning community due both to its practical utility in facilitating decision making, as well as the observation that modern neural network classifiers are poorly…

Machine Learning · Computer Science 2022-05-24 John Kirchenbauer , Jacob Oaks , Eric Heim

Understanding Model Calibration -- A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)

To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blogpost we'll take a look at the most commonly used definition for calibration and then dive into a…

Methodology · Statistics 2025-09-16 Maja Pavlovic

Evaluating the Quality of the Quantified Uncertainty for (Re)Calibration of Data-Driven Regression Models

In safety-critical applications data-driven models must not only be accurate but also provide reliable uncertainty estimates. This property, commonly referred to as calibration, is essential for risk-aware decision-making. In regression a…

Machine Learning · Computer Science 2026-04-23 Jelke Wibbeke , Nico Schönfisch , Sebastian Rohjans , Andreas Rauh

Evaluating Posterior Probabilities: Decision Theory, Proper Scoring Rules, and Calibration

Most machine learning classifiers are designed to output posterior probabilities for the classes given the input sample. These probabilities may be used to make the categorical decision on the class of the sample; provided as input to a…

Machine Learning · Statistics 2024-08-07 Luciana Ferrer , Daniel Ramos

Variable-Based Calibration for Machine Learning Classifiers

The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration…

Machine Learning · Computer Science 2023-04-07 Markelle Kelly , Padhraic Smyth

Local Calibration: Metrics and Recalibration

Probabilistic classifiers output confidence scores along with their predictions, and these confidence scores should be calibrated, i.e., they should reflect the reliability of the prediction. Confidence scores that minimize standard metrics…

Machine Learning · Computer Science 2022-08-22 Rachel Luo , Aadyot Bhatnagar , Yu Bai , Shengjia Zhao , Huan Wang , Caiming Xiong , Silvio Savarese , Stefano Ermon , Edward Schmerling , Marco Pavone

A Confidence Interval for the $\ell_2$ Expected Calibration Error

Recent advances in machine learning have significantly improved prediction accuracy in various applications. However, ensuring the calibration of probabilistic predictions remains a significant challenge. Despite efforts to enhance model…

Machine Learning · Statistics 2025-08-05 Yan Sun , Pratik Chaudhari , Ian J. Barnett , Edgar Dobriban

Evaluating model calibration in classification

Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their…

Machine Learning · Computer Science 2019-02-20 Juozas Vaicenavicius , David Widmann , Carl Andersson , Fredrik Lindsten , Jacob Roll , Thomas B. Schön

Calibration tests in multi-class classification: A unifying framework

In safety-critical applications a probabilistic model is usually required to be calibrated, i.e., to capture the uncertainty of its predictions accurately. In multi-class classification, calibration of the most confident predictions only is…

Machine Learning · Statistics 2022-09-30 David Widmann , Fredrik Lindsten , Dave Zachariah

Can a calibration metric be both testable and actionable?

Forecast probabilities often serve as critical inputs for binary decision making. In such settings, calibration$\unicode{x2014}$ensuring forecasted probabilities match empirical frequencies$\unicode{x2014}$is essential. Although the common…

Methodology · Statistics 2025-08-06 Raphael Rossellini , Jake A. Soloff , Rina Foygel Barber , Zhimei Ren , Rebecca Willett

Soft Mean Expected Calibration Error (SMECE): A Calibration Metric for Probabilistic Labels

The Expected Calibration Error (ece), the dominant calibration metric in machine learning, compares predicted probabilities against empirical frequencies of binary outcomes. This is appropriate when labels are binary events. However, many…

Machine Learning · Computer Science 2026-03-17 Michael Leznik

Probability calibration for precipitation nowcasting

Reliable precipitation nowcasting is critical for weather-sensitive decision-making, yet neural weather models (NWMs) can produce poorly calibrated probabilistic forecasts. Standard calibration metrics such as the expected calibration error…

Machine Learning · Computer Science 2025-12-01 Lauri Kurki , Yaniel Cabrera , Samu Karanko

Better Uncertainty Calibration via Proper Scores for Classification and Beyond

With model trustworthiness being crucial for sensitive real-world applications, practitioners are putting more and more focus on improving the uncertainty calibration of deep neural networks. Calibration errors are designed to quantify the…

Machine Learning · Computer Science 2024-03-14 Sebastian G. Gruber , Florian Buettner

Extending confidence calibration to generalised measures of variation

We propose the Variation Calibration Error (VCE) metric for assessing the calibration of machine learning classifiers. The metric can be viewed as an extension of the well-known Expected Calibration Error (ECE) which assesses the…

Machine Learning · Computer Science 2026-02-16 Andrew Thompson , Vivek Desai

On the Calibration of Probabilistic Classifier Sets

Multi-class classification methods that produce sets of probabilistic classifiers, such as ensemble learning methods, are able to model aleatoric and epistemic uncertainty. Aleatoric uncertainty is then typically quantified via the Bayes…

Machine Learning · Statistics 2023-04-20 Thomas Mortier , Viktor Bengs , Eyke Hüllermeier , Stijn Luca , Willem Waegeman

Why Calibration Error is Wrong Given Model Uncertainty: Using Posterior Predictive Checks with Deep Learning

Within the last few years, there has been a move towards using statistical models in conjunction with neural networks with the end goal of being able to better answer the question, "what do our models know?". From this trend, classical…

Machine Learning · Computer Science 2021-12-03 Achintya Gopal

Mitigating Bias in Calibration Error Estimation

For an AI system to be reliable, the confidence it expresses in its decisions must match its accuracy. To assess the degree of match, examples are typically binned by confidence and the per-bin mean confidence and accuracy are compared.…

Machine Learning · Computer Science 2022-02-14 Rebecca Roelofs , Nicholas Cain , Jonathon Shlens , Michael C. Mozer

Estimating calibration error under label shift without labels

In the face of dataset shift, model calibration plays a pivotal role in ensuring the reliability of machine learning systems. Calibration error (CE) is an indicator of the alignment between the predicted probabilities and the classifier…

Machine Learning · Computer Science 2023-12-15 Teodora Popordanoska , Gorjan Radevski , Tinne Tuytelaars , Matthew B. Blaschko

TCE: A Test-Based Approach to Measuring Calibration Error

This paper proposes a new metric to measure the calibration error of probabilistic binary classifiers, called test-based calibration error (TCE). TCE incorporates a novel loss function based on a statistical test to examine the extent to…

Machine Learning · Statistics 2023-06-27 Takuo Matsubara , Niek Tax , Richard Mudd , Ido Guy

An Entropic Metric for Measuring Calibration of Machine Learning Models

Understanding the confidence with which a machine learning model classifies an input datum is an important, and perhaps under-investigated, concept. In this paper, we propose a new calibration metric, the Entropic Calibration Difference…

Machine Learning · Computer Science 2025-02-21 Daniel James Sumler , Lee Devlin , Simon Maskell , Richard O. Lane