Related papers: Evaluating Probabilistic Classifiers: The Triptych

Evaluating probabilistic classifiers: Reliability diagrams and score decompositions revisited

A probability forecast or probabilistic classifier is reliable or calibrated if the predicted probabilities are matched by ex post observed frequencies, as examined visually in reliability diagrams. The classical binning and counting…

Methodology · Statistics 2021-08-26 Timo Dimitriadis , Tilmann Gneiting , Alexander I. Jordan

A User-Focused Approach to Evaluating Probabilistic and Categorical Forecasts

A user-focused verification approach for evaluating probability forecasts of binary outcomes (also known as probabilistic classifiers) is demonstrated that is (i) based on proper scoring rules, (ii) focuses on user decision thresholds, and…

Applications · Statistics 2024-03-25 Nicholas Loveday , Robert Taggart , Mohammadreza Khanarmuei

Evaluating model calibration in classification

Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their…

Machine Learning · Computer Science 2019-02-20 Juozas Vaicenavicius , David Widmann , Carl Andersson , Fredrik Lindsten , Jacob Roll , Thomas B. Schön

From Uncertainty to Precision: Enhancing Binary Classifier Performance through Calibration

The assessment of binary classifier performance traditionally centers on discriminative ability using metrics, such as accuracy. However, these metrics often disregard the model's inherent uncertainty, especially when dealing with sensitive…

Machine Learning · Computer Science 2024-02-13 Agathe Fernandes Machado , Arthur Charpentier , Emmanuel Flachaire , Ewen Gallic , François Hu

Calibrate: Interactive Analysis of Probabilistic Model Output

Analyzing classification model performance is a crucial task for machine learning practitioners. While practitioners often use count-based metrics derived from confusion matrices, like accuracy, many applications, such as weather…

Human-Computer Interaction · Computer Science 2022-07-29 Peter Xenopoulos , Joao Rulff , Luis Gustavo Nonato , Brian Barr , Claudio Silva

Never mind the metrics -- what about the uncertainty? Visualising confusion matrix metric distributions

There are strong incentives to build models that demonstrate outstanding predictive performance on various datasets and benchmarks. We believe these incentives risk a narrow focus on models and on the performance metrics used to evaluate…

Machine Learning · Computer Science 2022-06-07 David Lovell , Dimity Miller , Jaiden Capra , Andrew Bradley

Classifier Calibration: with application to threat scores in cybersecurity

This paper explores the calibration of a classifier output score in binary classification problems. A calibrator is a function that maps the arbitrary classifier score, of a testing observation, onto $[0,1]$ to provide an estimate for the…

Machine Learning · Computer Science 2022-04-29 Waleed A. Yousef , Issa Traore , William Briguglio

Probabilistic Scores of Classifiers, Calibration is not Enough

In binary classification tasks, accurate representation of probabilistic predictions is essential for various real-world applications such as predicting payment defaults or assessing medical risks. The model must then be well-calibrated to…

Machine Learning · Computer Science 2024-08-08 Agathe Fernandes Machado , Arthur Charpentier , Emmanuel Flachaire , Ewen Gallic , François Hu

Cross-calibration of probabilistic forecasts

When providing probabilistic forecasts for uncertain future events, it is common to strive for calibrated forecasts, that is, the predictive distribution should be compatible with the observed outcomes. Several notions of calibration are…

Methodology · Statistics 2015-05-21 Christof Strähl , Johanna F. Ziegel

Calibration of Machine Learning Classifiers for Probability of Default Modelling

Binary classification is highly used in credit scoring in the estimation of probability of default. The validation of such predictive models is based both on rank ability, and also on calibration (i.e. how accurately the probabilities…

Econometrics · Economics 2017-10-25 Pedro G. Fonseca , Hugo D. Lopes

Metrics of calibration for probabilistic predictions

Predictions are often probabilities; e.g., a prediction could be for precipitation tomorrow, but with only a 30% chance. Given such probabilistic predictions together with the actual outcomes, "reliability diagrams" help detect and diagnose…

Statistics Theory · Mathematics 2022-11-15 Imanol Arrieta-Ibarra , Paman Gujral , Jonathan Tannen , Mark Tygert , Cherie Xu

Regression Diagnostics meets Forecast Evaluation: Conditional Calibration, Reliability Diagrams, and Coefficient of Determination

Model diagnostics and forecast evaluation are two sides of the same coin. A common principle is that fitted or predicted distributions ought to be calibrated or reliable, ideally in the sense of auto-calibration, where the outcome is a…

Methodology · Statistics 2024-09-27 Tilmann Gneiting , Johannes Resin

From Classification Accuracy to Proper Scoring Rules: Elicitability of Probabilistic Top List Predictions

In the face of uncertainty, the need for probabilistic assessments has long been recognized in the literature on forecasting. In classification, however, comparative evaluation of classifiers often focuses on predictions specifying a single…

Methodology · Statistics 2023-05-31 Johannes Resin

Bias-corrected methods for estimating the receiver operating characteristic surface of continuous diagnostic tests

Verification bias is a well-known problem that may occur in the evaluation of predictive ability of diagnostic tests. When a binary disease status is considered, various solutions can be found in the literature to correct inference based on…

Methodology · Statistics 2023-04-10 Khanh To Duc , Monica Chiogna , Gianfranco Adimari

A Review of the Receiver Operating Characteristic Curve and a Proof About the Area Beneath It

The Receiver Operating Characteristic (ROC) curve of a binary classifier has often been utilized to measure the performance of the classifier. The area beneath this curve is used in particular because of its quoted probabilistic…

Machine Learning · Computer Science 2026-05-05 Steven Redolfi

Recipes for Calibration Checks in Safety-Critical Applications

Safety-critical prediction systems, such as autonomous vehicles, weather forecasters, and medical monitors, commonly rely on probabilistic forecasters. These forecasters make predictions about possible future outcomes, and their quality and…

Methodology · Statistics 2026-04-30 Romeo Valentin

Uniform reliability tests for forecasting systems with small lead time

A long noted difficulty when assessing the reliability (or calibration) of forecasting systems is that reliability, in general, is a hypothesis not about a finite dimensional parameter but about an entire functional relationship. A…

Data Analysis, Statistics and Probability · Physics 2020-12-09 Jochen Bröcker

Murphy Diagrams: Forecast Evaluation of Expected Shortfall

Motivated by the Basel 3 regulations, recent studies have considered joint forecasts of Value-at-Risk and Expected Shortfall. A large family of scoring functions can be used to evaluate forecast performance in this context. However, little…

Risk Management · Quantitative Finance 2017-05-15 Johanna F. Ziegel , Fabian Krüger , Alexander Jordan , Fernando Fasciati

Improving the Presentation and Understanding of Risk Models

The key concepts (calibration, discrimination, and discordance) important in understanding and comparing risk models are best conveyed graphically. To illustrate this, models predicting death and acute kidney injury in a large cohort of PCI…

Quantitative Methods · Quantitative Biology 2015-04-21 Ralph H. Stern , Dean E. Smith , Hitinder S. Gurm

The Manokhin Probability Matrix: A Diagnostic Framework for Classifier Probability Quality

The Brier score conflates two distinct properties of probabilistic predictions: reliability (calibration error) and resolution (discriminatory power). We introduce the Manokhin Probability Matrix, a BCG-style two-dimensional diagnostic…

Machine Learning · Statistics 2026-05-06 Valery Manokhin