Related papers: Missing Data Imputation using Optimal Transport

Optimal Transport for Structure Learning Under Missing Data

Causal discovery in the presence of missing data introduces a chicken-and-egg dilemma. While the goal is to recover the true causal structure, robust imputation requires considering the dependencies or, preferably, causal relations among…

Machine Learning · Computer Science 2024-06-04 Vy Vo , He Zhao , Trung Le , Edwin V. Bonilla , Dinh Phung

Transformed Distribution Matching for Missing Value Imputation

We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values…

Machine Learning · Computer Science 2023-06-26 He Zhao , Ke Sun , Amir Dezfouli , Edwin Bonilla

Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization

Learning conditional distributions $\pi^*(\cdot|x)$ is a central problem in machine learning, which is typically approached via supervised methods with paired data $(x,y) \sim \pi^*$. However, acquiring paired data samples is often…

Machine Learning · Computer Science 2025-11-06 Mikhail Persiianov , Arip Asadulaev , Nikita Andreev , Nikita Starodubcev , Dmitry Baranchuk , Anastasis Kratsios , Evgeny Burnaev , Alexander Korotin

Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification

Imbalanced data pose challenges for deep learning based classification models. One of the most widely-used approaches for tackling imbalanced data is re-weighting, where training samples are associated with different weights in the loss…

Machine Learning · Computer Science 2022-08-08 Dandan Guo , Zhuo Li , Meixi Zheng , He Zhao , Mingyuan Zhou , Hongyuan Zha

Missing Data: A Comparison of Neural Network and Expectation Maximisation Techniques

The estimation of missing input vector elements in real time processing applications requires a system that possesses the knowledge of certain characteristics such as correlations between variables, which are inherent in the input space.…

Applications · Statistics 2007-05-23 Fulufhelo V. Nelwamondo , Shakir Mohamed , Tshilidzi Marwala

Machine learning with incomplete datasets using multi-objective optimization models

Machine learning techniques have been developed to learn from complete data. When missing values exist in a dataset, the incomplete data should be preprocessed separately by removing data points with missing values or imputation. In this…

Machine Learning · Computer Science 2020-12-25 Hadi A. Khorshidi , Michael Kirley , Uwe Aickelin

Distribution Shift in Missing Data Imputation: A Risk-Based Perspective and Importance-Weighted Correction under MAR

Missing data imputation, where a model is trained on observed data to estimate unobserved values, is a fundamental problem in machine learning. In this paper, we rigorously formulate imputation model learning as a mean-squared error risk…

Machine Learning · Statistics 2026-05-14 Luke Shannon , Song Liu , Katarzyna Reluga

Empirical Optimal Transport under Estimated Costs: Distributional Limits and Statistical Applications

Optimal transport (OT) based data analysis is often faced with the issue that the underlying cost function is (partially) unknown. This paper is concerned with the derivation of distributional limits for the empirical OT value when the cost…

Statistics Theory · Mathematics 2023-01-05 Shayan Hundrieser , Gilles Mordant , Christoph Alexander Weitkamp , Axel Munk

Distributional Limit Theory for Optimal Transport

Optimal Transport (OT) is a resource allocation problem with applications in biology, data science, economics and statistics, among others. In some of the applications, practitioners have access to samples which approximate the continuous…

Statistics Theory · Mathematics 2025-05-27 Eustasio del Barrio , Alberto González-Sanz , Jean-Michel Loubes , David Rodríguez-Vítores

Detecting Out-of-Distribution Samples via Conditional Distribution Entropy with Optimal Transport

When deploying a trained machine learning model in the real world, it is inevitable to receive inputs from out-of-distribution (OOD) sources. For instance, in continual learning settings, it is common to encounter OOD samples due to the…

Machine Learning · Computer Science 2024-01-23 Chuanwen Feng , Wenlong Chen , Ao Ke , Yilong Ren , Xike Xie , S. Kevin Zhou

Proposition of a Theoretical Model for Missing Data Imputation using Deep Learning and Evolutionary Algorithms

In the last couple of decades, there has been major advancements in the domain of missing data imputation. The techniques in the domain include amongst others: Expectation Maximization, Neural Networks with Evolutionary Algorithms or…

Neural and Evolutionary Computing · Computer Science 2015-12-07 Collins Leke , Tshilidzi Marwala , Satyakama Paul

An Interdisciplinary and Cross-Task Review on Missing Data Imputation

Missing data is a fundamental challenge in data science, significantly hindering analysis and decision-making across a wide range of disciplines, including healthcare, bioinformatics, social science, e-commerce, and industrial monitoring.…

Machine Learning · Statistics 2026-05-12 Jicong Fan

Optimal Transport with Heterogeneously Missing Data

We consider the problem of solving the optimal transport problem between two empirical distributions with missing values. Our main assumption is that the data is missing completely at random (MCAR), but we allow for heterogeneous…

Machine Learning · Statistics 2025-05-26 Linus Bleistein , Aurélien Bellet , Julie Josse

Classification of datasets with imputed missing values: does imputation quality matter?

Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods,…

Machine Learning · Computer Science 2023-12-20 Tolou Shadbahr , Michael Roberts , Jan Stanczuk , Julian Gilbey , Philip Teare , Sören Dittmer , Matthew Thorpe , Ramon Vinas Torne , Evis Sala , Pietro Lio , Mishal Patel , AIX-COVNET Collaboration , James H. F. Rudd , Tuomas Mirtti , Antti Rannikko , John A. D. Aston , Jing Tang , Carola-Bibiane Schönlieb

Predicting Out-of-Distribution Error with Confidence Optimal Transport

Out-of-distribution (OOD) data poses serious challenges in deployed machine learning models as even subtle changes could incur significant performance drops. Being able to estimate a model's performance on test data is important in practice…

Machine Learning · Computer Science 2023-02-13 Yuzhe Lu , Zhenlin Wang , Runtian Zhai , Soheil Kolouri , Joseph Campbell , Katia Sycara

Outlier-Robust Optimal Transport

Optimal transport (OT) measures distances between distributions in a way that depends on the geometry of the sample space. In light of recent advances in computational OT, OT distances are widely used as loss functions in machine learning.…

Methodology · Statistics 2021-06-22 Debarghya Mukherjee , Aritra Guha , Justin Solomon , Yuekai Sun , Mikhail Yurochkin

Missing Data Imputation for Supervised Learning

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…

Machine Learning · Statistics 2020-08-11 Jason Poulos , Rafael Valle

Efficient estimates of optimal transport via low-dimensional embeddings

Optimal transport distances (OT) have been widely used in recent work in Machine Learning as ways to compare probability distributions. These are costly to compute when the data lives in high dimension. Recent work by Paty et al., 2019,…

Machine Learning · Computer Science 2021-11-10 Patric M. Fulop , Vincent Danos

Statistical Inference for Optimal Transport Maps: Recent Advances and Perspectives

In many applications of optimal transport (OT), the object of primary interest is the optimal transport map. This map rearranges mass from one probability distribution to another in the most efficient way possible by minimizing a specified…

Statistics Theory · Mathematics 2025-06-25 Sivaraman Balakrishnan , Tudor Manole , Larry Wasserman

Making transport more robust and interpretable by moving data through a small number of anchor points

Optimal transport (OT) is a widely used technique for distribution alignment, with applications throughout the machine learning, graphics, and vision communities. Without any additional structural assumptions on trans-port, however, OT can…

Machine Learning · Computer Science 2021-07-20 Chi-Heng Lin , Mehdi Azabou , Eva L. Dyer