Related papers: Missing Data Imputation using Optimal Transport
Causal discovery in the presence of missing data introduces a chicken-and-egg dilemma. While the goal is to recover the true causal structure, robust imputation requires considering the dependencies or, preferably, causal relations among…
We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values…
Learning conditional distributions $\pi^*(\cdot|x)$ is a central problem in machine learning, which is typically approached via supervised methods with paired data $(x,y) \sim \pi^*$. However, acquiring paired data samples is often…
Imbalanced data pose challenges for deep learning based classification models. One of the most widely-used approaches for tackling imbalanced data is re-weighting, where training samples are associated with different weights in the loss…
The estimation of missing input vector elements in real time processing applications requires a system that possesses the knowledge of certain characteristics such as correlations between variables, which are inherent in the input space.…
Machine learning techniques have been developed to learn from complete data. When missing values exist in a dataset, the incomplete data should be preprocessed separately by removing data points with missing values or imputation. In this…
Missing data imputation, where a model is trained on observed data to estimate unobserved values, is a fundamental problem in machine learning. In this paper, we rigorously formulate imputation model learning as a mean-squared error risk…
Optimal transport (OT) based data analysis is often faced with the issue that the underlying cost function is (partially) unknown. This paper is concerned with the derivation of distributional limits for the empirical OT value when the cost…
Optimal Transport (OT) is a resource allocation problem with applications in biology, data science, economics and statistics, among others. In some of the applications, practitioners have access to samples which approximate the continuous…
When deploying a trained machine learning model in the real world, it is inevitable to receive inputs from out-of-distribution (OOD) sources. For instance, in continual learning settings, it is common to encounter OOD samples due to the…
In the last couple of decades, there has been major advancements in the domain of missing data imputation. The techniques in the domain include amongst others: Expectation Maximization, Neural Networks with Evolutionary Algorithms or…
Missing data is a fundamental challenge in data science, significantly hindering analysis and decision-making across a wide range of disciplines, including healthcare, bioinformatics, social science, e-commerce, and industrial monitoring.…
We consider the problem of solving the optimal transport problem between two empirical distributions with missing values. Our main assumption is that the data is missing completely at random (MCAR), but we allow for heterogeneous…
Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods,…
Out-of-distribution (OOD) data poses serious challenges in deployed machine learning models as even subtle changes could incur significant performance drops. Being able to estimate a model's performance on test data is important in practice…
Optimal transport (OT) measures distances between distributions in a way that depends on the geometry of the sample space. In light of recent advances in computational OT, OT distances are widely used as loss functions in machine learning.…
Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…
Optimal transport distances (OT) have been widely used in recent work in Machine Learning as ways to compare probability distributions. These are costly to compute when the data lives in high dimension. Recent work by Paty et al., 2019,…
In many applications of optimal transport (OT), the object of primary interest is the optimal transport map. This map rearranges mass from one probability distribution to another in the most efficient way possible by minimizing a specified…
Optimal transport (OT) is a widely used technique for distribution alignment, with applications throughout the machine learning, graphics, and vision communities. Without any additional structural assumptions on trans-port, however, OT can…