Related papers: Mandoline: Model Evaluation under Distribution Shi…

Correcting sampling biases via importance reweighting for spatial modeling

In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to…

Machine Learning · Computer Science 2023-09-15 Boris Prokhorov , Diana Koldasbayeva , Alexey Zaytsev

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question…

Computation and Language · Computer Science 2021-09-06 Paul Michel

A Short Survey on Importance Weighting for Machine Learning

Importance weighting is a fundamental procedure in statistics and machine learning that weights the objective function or probability distribution based on the importance of the instance in some sense. The simplicity and usefulness of the…

Machine Learning · Computer Science 2024-05-15 Masanari Kimura , Hideitsu Hino

Shifts 2.0: Extending The Dataset of Real Distributional Shifts

Distributional shift, or the mismatch between training and deployment data, is a significant obstacle to the usage of machine learning in high-stakes industrial applications, such as autonomous driving and medicine. This creates a need to…

Machine Learning · Computer Science 2022-09-16 Andrey Malinin , Andreas Athanasopoulos , Muhamed Barakovic , Meritxell Bach Cuadra , Mark J. F. Gales , Cristina Granziera , Mara Graziani , Nikolay Kartashev , Konstantinos Kyriakopoulos , Po-Jui Lu , Nataliia Molchanova , Antonis Nikitakis , Vatsal Raina , Francesco La Rosa , Eli Sivena , Vasileios Tsarsitalidis , Efi Tsompopoulou , Elena Volf

Tracking the risk of a deployed model and detecting harmful distribution shifts

When deployed in the real world, machine learning models inevitably encounter changes in the data distribution, and certain -- but not all -- distribution shifts could result in significant performance degradation. In practice, it may make…

Machine Learning · Statistics 2022-05-06 Aleksandr Podkopaev , Aaditya Ramdas

Entropic Mirror Monte Carlo

Importance sampling is a Monte Carlo method which designs estimators of expectations under a target distribution using weighted samples from a proposal distribution. When the target distribution is complex, such as multimodal distributions…

Methodology · Statistics 2026-02-04 Anas Cherradi , Yazid Janati , Alain Durmus , Sylvain Le Corff , Yohan Petetin , Julien Stoehr

Handling Out-of-Distribution Data: A Survey

In the field of Machine Learning (ML) and data-driven applications, one of the significant challenge is the change in data distribution between the training and deployment stages, commonly known as distribution shift. This paper outlines…

Machine Learning · Computer Science 2025-07-30 Lakpa Tamang , Mohamed Reda Bouadjenek , Richard Dazeley , Sunil Aryal

"Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts

Machine learning models frequently experience performance drops under distribution shifts. The underlying cause of such shifts may be multiple simultaneous factors such as changes in data quality, differences in specific covariate…

Machine Learning · Computer Science 2023-06-07 Haoran Zhang , Harvineet Singh , Marzyeh Ghassemi , Shalmali Joshi

Beyond Point Estimates: Distributional Uncertainty in Machine Learning Performance Evaluation

Machine learning models are often evaluated using point estimates of performance metrics such as accuracy, F1 score, or mean squared error. Such summaries fail to capture the inherent variability induced by stochastic elements of the…

Machine Learning · Computer Science 2026-05-13 Christoph Lehmann , Yahor Paromau

Measuring Distributional Shifts in Text: The Advantage of Language Model-Based Embeddings

An essential part of monitoring machine learning models in production is measuring input and output data drift. In this paper, we present a system for measuring distributional shifts in natural language data and highlight and investigate…

Computation and Language · Computer Science 2023-12-06 Gyandev Gupta , Bashir Rastegarpanah , Amalendu Iyer , Joshua Rubin , Krishnaram Kenthapadi

Rethinking Importance Weighting for Transfer Learning

A key assumption in supervised learning is that training and test data follow the same probability distribution. However, this fundamental assumption is not always satisfied in practice, e.g., due to changing environments, sample selection…

Machine Learning · Computer Science 2021-12-21 Nan Lu , Tianyi Zhang , Tongtong Fang , Takeshi Teshima , Masashi Sugiyama

Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning

We introduce distributed NLI, a new NLU task with a goal to predict the distribution of human judgements for natural language inference. We show that by applying additional distribution estimation methods, namely, Monte Carlo (MC) Dropout,…

Computation and Language · Computer Science 2022-04-08 Xiang Zhou , Yixin Nie , Mohit Bansal

A First Step Towards Distribution Invariant Regression Metrics

Regression evaluation has been performed for decades. Some metrics have been identified to be robust against shifting and scaling of the data but considering the different distributions of data is much more difficult to address (imbalance…

Machine Learning · Computer Science 2020-09-14 Mario Michael Krell , Bilal Wehbe

Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data

Most machine learning models operate under the assumption that the training, testing and deployment data is independent and identically distributed (i.i.d.). This assumption doesn't generally hold true in a natural setting. Usually, the…

Machine Learning · Computer Science 2021-12-14 Kumud Lakara , Akshat Bhandari , Pratinav Seth , Ujjwal Verma

Understanding new tasks through the lens of training data via exponential tilting

Deploying machine learning models to new tasks is a major challenge despite the large size of the modern training datasets. However, it is conceivable that the training data can be reweighted to be more representative of the new (target)…

Machine Learning · Computer Science 2023-02-22 Subha Maity , Mikhail Yurochkin , Moulinath Banerjee , Yuekai Sun

Importance Sampling with Unequal Support

Importance sampling is often used in machine learning when training and testing data come from different distributions. In this paper we propose a new variant of importance sampling that can reduce the variance of importance sampling-based…

Machine Learning · Computer Science 2016-11-11 Philip S. Thomas , Emma Brunskill

Probabilistic Runtime Verification, Evaluation and Risk Assessment of Visual Deep Learning Systems

Despite achieving excellent performance on benchmarks, deep neural networks often underperform in real-world deployment due to sensitivity to minor, often imperceptible shifts in input data, known as distributional shifts. These shifts are…

Machine Learning · Computer Science 2025-09-25 Birk Torpmann-Hagen , Pål Halvorsen , Michael A. Riegler , Dag Johansen

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

We consider the two related problems of detecting if an example is misclassified or out-of-distribution. We present a simple baseline that utilizes probabilities from softmax distributions. Correctly classified examples tend to have greater…

Neural and Evolutionary Computing · Computer Science 2018-10-04 Dan Hendrycks , Kevin Gimpel

Explanation Shift: How Did the Distribution Shift Impact the Model?

As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In practice, new input data tend to come without target labels. Then, state-of-the-art techniques model input data distributions…

Machine Learning · Computer Science 2023-09-08 Carlos Mougan , Klaus Broelemann , David Masip , Gjergji Kasneci , Thanassis Thiropanis , Steffen Staab

Evaluating Model Robustness and Stability to Dataset Shift

As the use of machine learning in high impact domains becomes widespread, the importance of evaluating safety has increased. An important aspect of this is evaluating how robust a model is to changes in setting or population, which…

Machine Learning · Computer Science 2021-03-16 Adarsh Subbaswamy , Roy Adams , Suchi Saria