Related papers: Text Classification Under Class Distribution Shift…

Out of Distribution Generalization in Machine Learning

Machine learning has achieved tremendous success in a variety of domains in recent years. However, a lot of these success stories have been in places where the training and the testing distributions are extremely similar to each other. In…

Machine Learning · Statistics 2021-03-05 Martin Arjovsky

Out-of-Distribution Generalization in Text Classification: Past, Present, and Future

Machine learning (ML) systems in natural language processing (NLP) face significant challenges in generalizing to out-of-distribution (OOD) data, where the test distribution differs from the training data distribution. This poses important…

Computation and Language · Computer Science 2023-05-24 Linyi Yang , Yaoxiao Song , Xuan Ren , Chenyang Lyu , Yidong Wang , Lingqiao Liu , Jindong Wang , Jennifer Foster , Yue Zhang

Trustworthy Machine Learning under Distribution Shifts

Machine Learning (ML) has been a foundational topic in artificial intelligence (AI), providing both theoretical groundwork and practical tools for its exciting advancements. From ResNet for visual recognition to Transformer for…

Machine Learning · Computer Science 2025-12-30 Zhuo Huang

Distribution-Based Categorization of Classifier Transfer Learning

Transfer Learning (TL) aims to transfer knowledge acquired in one problem, the source problem, onto another problem, the target problem, dispensing with the bottom-up construction of the target model. Due to its relevance, TL has gained…

Machine Learning · Computer Science 2017-12-07 Ricardo Gamelas Sousa , Luís A. Alexandre , Jorge M. Santos , Luís M. Silva , Joaquim Marques de Sá

Fairness Hub Technical Briefs: Definition and Detection of Distribution Shift

Distribution shift is a common situation in machine learning tasks, where the data used for training a model is different from the data the model is applied to in the real world. This issue arises across multiple technical settings: from…

Machine Learning · Computer Science 2024-05-24 Nicolas Acevedo , Carmen Cortez , Chris Brooks , Rene Kizilcec , Renzhe Yu

"What is Different Between These Datasets?" A Framework for Explaining Data Distribution Shifts

The performance of machine learning models relies heavily on the quality of input data, yet real-world applications often face significant data-related challenges. A common issue arises when curating training data or deploying models: two…

Machine Learning · Computer Science 2025-09-24 Varun Babbar , Zhicheng Guo , Cynthia Rudin

A Primer on Domain Adaptation

Standard supervised machine learning assumes that the distribution of the source samples used to train an algorithm is the same as the one of the target samples on which it is supposed to make predictions. However, as any data scientist…

Machine Learning · Computer Science 2020-02-12 Pirmin Lemberger , Ivan Panico

DOC: Deep Open Classification of Text Documents

Traditional supervised learning makes the closed-world assumption that the classes appeared in the test data must have appeared in training. This also applies to text learning or text classification. As learning is used increasingly in…

Computation and Language · Computer Science 2017-09-27 Lei Shu , Hu Xu , Bing Liu

Robust Classification under Class-Dependent Domain Shift

Investigation of machine learning algorithms robust to changes between the training and test distributions is an active area of research. In this paper we explore a special type of dataset shift which we call class-dependent domain shift.…

Machine Learning · Computer Science 2020-07-13 Tigran Galstyan , Hrant Khachatrian , Greg Ver Steeg , Aram Galstyan

Detecting Bias in Transfer Learning Approaches for Text Classification

Classification is an essential and fundamental task in machine learning, playing a cardinal role in the field of natural language processing (NLP) and computer vision (CV). In a supervised learning setting, labels are always needed for the…

Computation and Language · Computer Science 2021-02-04 Irene Li

Understanding Continual Learning Settings with Data Distribution Drift Analysis

Classical machine learning algorithms often assume that the data are drawn i.i.d. from a stationary probability distribution. Recently, continual learning emerged as a rapidly growing area of machine learning where this assumption is…

Machine Learning · Computer Science 2022-07-12 Timothée Lesort , Massimo Caccia , Irina Rish

An introduction to domain adaptation and transfer learning

In machine learning, if the training data is an unbiased sample of an underlying distribution, then the learned classification function will make accurate predictions for new samples. However, if the training data is not an unbiased sample,…

Machine Learning · Computer Science 2019-01-15 Wouter M. Kouw , Marco Loog

A Brief Review of Domain Adaptation

Classical machine learning assumes that the training and test sets come from the same distributions. Therefore, a model learned from the labeled training data is expected to perform well on the test data. However, This assumption may not…

Machine Learning · Computer Science 2020-10-12 Abolfazl Farahani , Sahar Voghoei , Khaled Rasheed , Hamid R. Arabnia

An Information-theoretic Approach to Distribution Shifts

Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a…

Machine Learning · Computer Science 2021-11-02 Marco Federici , Ryota Tomioka , Patrick Forré

Handling Out-of-Distribution Data: A Survey

In the field of Machine Learning (ML) and data-driven applications, one of the significant challenge is the change in data distribution between the training and deployment stages, commonly known as distribution shift. This paper outlines…

Machine Learning · Computer Science 2025-07-30 Lakpa Tamang , Mohamed Reda Bouadjenek , Richard Dazeley , Sunil Aryal

Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

Zero-shot learning methods typically assume that the new, unseen classes encountered during deployment come from the same distribution as the the classes in the training set. However, real-world scenarios often involve class distribution…

Machine Learning · Computer Science 2024-12-11 Yuli Slavutsky , Yuval Benjamini

Thinking Beyond Distributions in Testing Machine Learned Models

Testing practices within the machine learning (ML) community have centered around assessing a learned model's predictive performance measured against a test dataset, often drawn from the same distribution as the training dataset. While…

Machine Learning · Computer Science 2021-12-07 Negar Rostamzadeh , Ben Hutchinson , Christina Greer , Vinodkumar Prabhakaran

A survey on domain adaptation theory: learning bounds and theoretical guarantees

All famous machine learning algorithms that comprise both supervised and semi-supervised learning work well only under a common assumption: the training and test data follow the same distribution. When the distribution changes, most…

Machine Learning · Computer Science 2022-07-15 Ievgen Redko , Emilie Morvant , Amaury Habrard , Marc Sebban , Younès Bennani

Machine Learning in Automated Text Categorization

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize…

Information Retrieval · Computer Science 2021-09-21 Fabrizio Sebastiani

Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data

Continual learning is the problem of learning and retaining knowledge through time over multiple tasks and environments. Research has primarily focused on the incremental classification setting, where new tasks/classes are added at discrete…

Machine Learning · Computer Science 2021-09-23 Zhipeng Cai , Ozan Sener , Vladlen Koltun