Related papers: Sparse joint shift in multinomial classification
Deployed machine learning (ML) models often encounter new user data that differs from their training data. Therefore, estimating how well a given model might perform on the new data is an important step toward reliable ML applications. This…
Factorizable joint shift (FJS) was recently proposed as a type of dataset shift for which the complete characteristics can be estimated from feature data observations on the test dataset by a method called Joint Importance Aligning. For the…
Factorizable joint shift (FJS) represents a type of distribution shift (or dataset shift) that comprises both covariate and label shift. Recently, it has been observed that FJS actually arises from consecutive label and covariate (or vice…
In many practical applications of machine learning, a discrepancy often arises between a source distribution from which labeled training examples are drawn and a target distribution for which only unlabeled data is observed. Traditionally,…
We study the problem of class distribution estimation under dataset shift. On the training dataset, both features and class labels are observed while on the test dataset only the features can be observed. The task then is the estimation of…
Machine learning approaches commonly rely on the assumption of independent and identically distributed (i.i.d.) data. In reality, however, this assumption is almost always violated due to distribution shifts between environments. Although…
In many real applications of statistical learning, collecting sufficiently many training data is often expensive, time-consuming, or even unrealistic. In this case, a transfer learning approach, which aims to leverage knowledge from a…
Existing domain adaptation (DA) usually assumes the domain shift comes from either the covariates or the labels. However, in real-world applications, samples selected from different domains could have biases in both the covariates and the…
In many applications, data can be heterogeneous in the sense of spanning latent groups with different underlying distributions. When predictive models are applied to such data the heterogeneity can affect both predictive performance and…
In this paper we formally analyse the use of sparse filtering algorithms to perform covariate shift adaptation. We provide a theoretical analysis of sparse filtering by evaluating the conditions required to perform covariate shift…
Covariate shift, a widely used assumption in tackling {\it distributional shift} (when training and test distributions differ), focuses on scenarios where the distribution of the labels conditioned on the feature vector is the same, but the…
In multi-label learning, each sample is associated with several labels. Existing works indicate that exploring correlations between labels improve the prediction performance. However, embedding the label correlations into the training…
Classifier predictions often rely on the assumption that new observations come from the same distribution as training data. When the underlying distribution changes, so does the optimal classification rule, and performance may degrade. We…
A significant obstacle in the development of robust machine learning models is covariate shift, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label…
Concept shift occurs when the distribution of labels conditioned on the features changes between domains, which can make even a well-tuned ML model miscalibrated on a new domain. Identifying these shifted features provides unique insight…
In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers -- either due to intrinsic statistical noise of data or for robustness…
What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under…
We propose a simple, statistically principled, and theoretically justified method to improve supervised learning when the training set is not representative, a situation known as covariate shift. We build upon a well-established methodology…
We show that in the context of classification the property of source and target distributions to be related by covariate shift may be lost if the information content captured in the covariates is reduced, for instance by dropping components…
Machine learning systems are often trained and evaluated for fairness on historical data, yet deployed in environments where conditions have shifted. A particularly common form of shift occurs when the prevalence of positive outcomes…