English
Related papers

Related papers: Missing Data Imputation using Optimal Transport

200 papers

Causal discovery in the presence of missing data introduces a chicken-and-egg dilemma. While the goal is to recover the true causal structure, robust imputation requires considering the dependencies or, preferably, causal relations among…

Machine Learning · Computer Science 2024-06-04 Vy Vo , He Zhao , Trung Le , Edwin V. Bonilla , Dinh Phung

We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values…

Machine Learning · Computer Science 2023-06-26 He Zhao , Ke Sun , Amir Dezfouli , Edwin Bonilla

Learning conditional distributions $\pi^*(\cdot|x)$ is a central problem in machine learning, which is typically approached via supervised methods with paired data $(x,y) \sim \pi^*$. However, acquiring paired data samples is often…

Imbalanced data pose challenges for deep learning based classification models. One of the most widely-used approaches for tackling imbalanced data is re-weighting, where training samples are associated with different weights in the loss…

Machine Learning · Computer Science 2022-08-08 Dandan Guo , Zhuo Li , Meixi Zheng , He Zhao , Mingyuan Zhou , Hongyuan Zha

The estimation of missing input vector elements in real time processing applications requires a system that possesses the knowledge of certain characteristics such as correlations between variables, which are inherent in the input space.…

Applications · Statistics 2007-05-23 Fulufhelo V. Nelwamondo , Shakir Mohamed , Tshilidzi Marwala

Machine learning techniques have been developed to learn from complete data. When missing values exist in a dataset, the incomplete data should be preprocessed separately by removing data points with missing values or imputation. In this…

Machine Learning · Computer Science 2020-12-25 Hadi A. Khorshidi , Michael Kirley , Uwe Aickelin

Missing data imputation, where a model is trained on observed data to estimate unobserved values, is a fundamental problem in machine learning. In this paper, we rigorously formulate imputation model learning as a mean-squared error risk…

Machine Learning · Statistics 2026-05-14 Luke Shannon , Song Liu , Katarzyna Reluga

Optimal transport (OT) based data analysis is often faced with the issue that the underlying cost function is (partially) unknown. This paper is concerned with the derivation of distributional limits for the empirical OT value when the cost…

Statistics Theory · Mathematics 2023-01-05 Shayan Hundrieser , Gilles Mordant , Christoph Alexander Weitkamp , Axel Munk

Optimal Transport (OT) is a resource allocation problem with applications in biology, data science, economics and statistics, among others. In some of the applications, practitioners have access to samples which approximate the continuous…

When deploying a trained machine learning model in the real world, it is inevitable to receive inputs from out-of-distribution (OOD) sources. For instance, in continual learning settings, it is common to encounter OOD samples due to the…

Machine Learning · Computer Science 2024-01-23 Chuanwen Feng , Wenlong Chen , Ao Ke , Yilong Ren , Xike Xie , S. Kevin Zhou

In the last couple of decades, there has been major advancements in the domain of missing data imputation. The techniques in the domain include amongst others: Expectation Maximization, Neural Networks with Evolutionary Algorithms or…

Neural and Evolutionary Computing · Computer Science 2015-12-07 Collins Leke , Tshilidzi Marwala , Satyakama Paul

Missing data is a fundamental challenge in data science, significantly hindering analysis and decision-making across a wide range of disciplines, including healthcare, bioinformatics, social science, e-commerce, and industrial monitoring.…

Machine Learning · Statistics 2026-05-12 Jicong Fan

We consider the problem of solving the optimal transport problem between two empirical distributions with missing values. Our main assumption is that the data is missing completely at random (MCAR), but we allow for heterogeneous…

Machine Learning · Statistics 2025-05-26 Linus Bleistein , Aurélien Bellet , Julie Josse

Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods,…

Out-of-distribution (OOD) data poses serious challenges in deployed machine learning models as even subtle changes could incur significant performance drops. Being able to estimate a model's performance on test data is important in practice…

Machine Learning · Computer Science 2023-02-13 Yuzhe Lu , Zhenlin Wang , Runtian Zhai , Soheil Kolouri , Joseph Campbell , Katia Sycara

Optimal transport (OT) measures distances between distributions in a way that depends on the geometry of the sample space. In light of recent advances in computational OT, OT distances are widely used as loss functions in machine learning.…

Methodology · Statistics 2021-06-22 Debarghya Mukherjee , Aritra Guha , Justin Solomon , Yuekai Sun , Mikhail Yurochkin

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…

Machine Learning · Statistics 2020-08-11 Jason Poulos , Rafael Valle

Optimal transport distances (OT) have been widely used in recent work in Machine Learning as ways to compare probability distributions. These are costly to compute when the data lives in high dimension. Recent work by Paty et al., 2019,…

Machine Learning · Computer Science 2021-11-10 Patric M. Fulop , Vincent Danos

In many applications of optimal transport (OT), the object of primary interest is the optimal transport map. This map rearranges mass from one probability distribution to another in the most efficient way possible by minimizing a specified…

Statistics Theory · Mathematics 2025-06-25 Sivaraman Balakrishnan , Tudor Manole , Larry Wasserman

Optimal transport (OT) is a widely used technique for distribution alignment, with applications throughout the machine learning, graphics, and vision communities. Without any additional structural assumptions on trans-port, however, OT can…

Machine Learning · Computer Science 2021-07-20 Chi-Heng Lin , Mehdi Azabou , Eva L. Dyer
‹ Prev 1 2 3 10 Next ›