Pascal Pernot
This short study presents an opportunistic approach to a (more) reliable validation method for prediction uncertainty average calibration. Considering that variance-based calibration metrics (ZMS, NLL, RCE...) are quite sensitive to the…
Average calibration of the (variance-based) prediction uncertainties of machine learning regression tasks can be tested in two ways: one is to estimate the calibration error (CE) as the difference between the mean absolute error (MSE) and…
Some popular Machine Learning Uncertainty Quantification (ML-UQ) calibration statistics do not have predefined reference values and are mostly used in comparative studies. In consequence, calibration is almost never validated and the…
Reliable uncertainty quantification (UQ) in machine learning (ML) regression tasks is becoming the focus of many studies in materials and chemical science. It is now well understood that average calibration is insufficient, and most studies…
Binwise Variance Scaling (BVS) has recently been proposed as a post hoc recalibration method for prediction uncertainties of machine learning regression problems that is able of more efficient corrections than uniform variance (or…
Abstract Post hoc recalibration of prediction uncertainties of machine learning regression problems by isotonic regression might present a problem for bin-based calibration error statistics (e.g. ENCE). Isotonic regression often produces…
The Expected Normalized Calibration Error (ENCE) is a popular calibration statistic used in Machine Learning to assess the quality of prediction uncertainties for regression problems. Estimation of the ENCE is based on the binning of…
The practice of uncertainty quantification (UQ) validation, notably in machine learning for the physico-chemical sciences, rests on several graphical methods (scattering plots, calibration curves, reliability diagrams and confidence curves)…
Confidence curves are used in uncertainty validation to assess how large uncertainties ($u_{E}$) are associated with large errors ($E$). An oracle curve is commonly used as reference to estimate the quality of the tested datasets. The…
Validation of prediction uncertainty (PU) is becoming an essential task for modern computational chemistry. Designed to quantify the reliability of predictions in meteorology, the calibration-sharpness (CS) framework is now widely used to…
We review the alternative proposals introduced recently in the literature to update the standard formula to estimate the uncertainty on the mean of repeated measurements, and we compare their performances on synthetic examples with normal…
Uncertainty quantification (UQ) in computational chemistry (CC) is still in its infancy. Very few CC methods are designed to provide a confidence level on their predictions, and most users still rely improperly on the mean absolute error as…
PURPOSE: To develop an automated algorithm allowing extraction of quantitative corneal transparency parameters from clinical spectral-domain OCT images. To establish a representative dataset of normative transparency values from healthy…
The distribution of errors is a central object in the assesment and benchmarking of computational chemistry methods. The popular and often blind use of the mean unsigned error as a benchmarking statistic leads to ignore distributions…
Quantum machine learning models have been gaining significant traction within atomistic simulation communities. Conventionally, relative model performances are being assessed and compared using learning curves (prediction error vs. training…
The comparison of benchmark error sets is an essential tool for the evaluation of theories in computational chemistry. The standard ranking of methods by their Mean Unsigned Error is unsatisfactory for several reasons linked to the…
With the upcoming launch of space telescopes dedicated to the study of exoplanets, the \textit{Atmospheric Remote-Sensing Infrared Exoplanet Large-survey} (ARIEL) and the \textit{James Webb Space Telescope} (JWST), a new era is opening in…
In the first part of this study (Paper I), we introduced the systematic improvement probability (SIP) as a tool to assess the level of improvement on absolute errors to be expected when switching between two computational chemistry methods.…
Computational chemistry has become an important complement to experimental measurements. In order to choose among the multitude of the existing approximations, it is common to use benchmark data sets, and to issue recommendations based on…
Thanks to the \textit{Cassini} spacecraft onboard instruments, it has been known that Titan's ionospheric chemistry is complex and the molecular growth is initiated through the photolysis of the most abundant species directly in the upper…