Related papers: Factorizable Joint Shift in Multinomial Classifica…
Factorizable joint shift (FJS) represents a type of distribution shift (or dataset shift) that comprises both covariate and label shift. Recently, it has been observed that FJS actually arises from consecutive label and covariate (or vice…
Existing domain adaptation (DA) usually assumes the domain shift comes from either the covariates or the labels. However, in real-world applications, samples selected from different domains could have biases in both the covariates and the…
Sparse joint shift (SJS) was recently proposed as a tractable model for general dataset shift which may cause changes to the marginal distributions of features and labels as well as the posterior probabilities and the class-conditional…
We study the problem of class distribution estimation under dataset shift. On the training dataset, both features and class labels are observed while on the test dataset only the features can be observed. The task then is the estimation of…
Federated Learning (FL) enables collaborative model training across decentralized clients without sharing private data. However, FL suffers from biased global models due to non-IID and long-tail data distributions. We propose…
When machine learning supports decision-making in safety-critical systems, it is important to verify and understand the reasons why a particular output is produced. Although feature importance calculation approaches assist in…
The task of Few-shot learning (FSL) aims to transfer the knowledge learned from base categories with sufficient labelled data to novel categories with scarce known information. It is currently an important research question and has great…
With the widespread use of machine learning to support decision-making, it is increasingly important to verify and understand the reasons why a particular output is produced. Although post-training feature importance approaches assist this…
It is widely held that one cause of downstream bias in classifiers is bias present in the training data. Rectifying such biases may involve context-dependent interventions such as training separate models on subgroups, removing features…
Conjoint analysis is a popular experimental design used to measure multidimensional preferences. Researchers examine how varying a factor of interest, while controlling for other relevant factors, influences decision-making. Currently,…
Federated learning aims to learn a global model collaboratively while the training data belongs to different clients and is not allowed to be exchanged. However, the statistical heterogeneity challenge on non-IID data, such as class…
Feature selection has been proven a powerful preprocessing step for high-dimensional data analysis. However, most state-of-the-art methods tend to overlook the structural correlation information between pairwise samples, which may…
The goal of this article is to select important variables that can distinguish one class of data from another. A marginal variable selection method ranks the marginal effects for classification of individual variables, and is a useful and…
Federated Learning (FL) commonly relies on a central server to coordinate training across distributed clients. While effective, this paradigm suffers from significant communication overhead, impacting overall training efficiency. To…
Probabilistic classification of unassociated Fermi-LAT sources using machine learning methods has an implicit assumption that the distributions of associated and unassociated sources are the same as a function of source parameters, which is…
Modern data science applications often involve complex relational data with dynamic structures. An abrupt change in such dynamic relational data is typically observed in systems that undergo regime changes due to interventions. In such a…
Distributed and federated learning are important tools for high-dimensional classification of large datasets. To reduce computational costs and overcome the curse of dimensionality, feature screening plays a pivotal role in eliminating…
Feature alignment serves as the primary mechanism for fusing multimodal data. We put forth a feature alignment approach that achieves full integration of multimodal information. This is accomplished via an alternating process of shifting…
Large Language Models (LLMs) are being increasingly used within data systems to process large datasets with text fields. A broad class of such tasks involves a semantic join-joining two tables based on a natural language predicate per pair…
It is increasingly common to collect data of multiple different types on the same set of samples. Our focus is on studying relationships between such multiview features and responses. A motivating application arises in the context of…