Related papers: A primer on linear classification with missing dat…

Missing Data Imputation for Supervised Learning

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…

Machine Learning · Statistics 2020-08-11 Jason Poulos , Rafael Valle

When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values

Predicting with missing inputs challenges even parametric models, as parameter estimation alone is insufficient for prediction on incomplete data. While several works study prediction in linear models, we focus on logistic models, where…

Machine Learning · Statistics 2026-02-03 Christophe Muller , Erwan Scornet , Julie Josse

Adaptive Optimization for Prediction with Missing Data

When training predictive models on data with missing entries, the most widely used and versatile approach is a pipeline technique where we first impute missing entries and then compute predictions. In this paper, we view prediction with…

Machine Learning · Computer Science 2025-02-25 Dimitris Bertsimas , Arthur Delarue , Jean Pauphilet

High-Dimensional Tensor Discriminant Analysis with Incomplete Tensors

Tensor classification is gaining importance across fields, yet handling partially observed data remains challenging. In this paper, we introduce a novel approach to tensor classification with incomplete data, framed within high-dimensional…

Machine Learning · Statistics 2024-11-01 Elynn Chen , Yuefeng Han , Jiayu Li

Random features models: a way to study the success of naive imputation

Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data. Yet, this simple method could be expected to induce a large bias for prediction purposes, as the imputed input…

Statistics Theory · Mathematics 2024-02-07 Alexis Ayme , Claire Boyer , Aymeric Dieuleveut , Erwan Scornet

Directly Handling Missing Data in Linear Discriminant Analysis for Enhancing Classification Accuracy and Interpretability

As the adoption of Artificial Intelligence (AI) models expands into critical real-world applications, ensuring the explainability of these models becomes paramount, particularly in sensitive fields such as medicine and finance. Linear…

Machine Learning · Computer Science 2024-10-10 Tuan L. Vo , Uyen Dang , Thu Nguyen

On the consistency of supervised learning with missing values

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here,…

Machine Learning · Statistics 2024-03-22 Julie Josse , Jacob M. Chen , Nicolas Prost , Erwan Scornet , Gaël Varoquaux

Model Evaluation in the Dark: Robust Classifier Metrics with Missing Labels

Missing data in supervised learning is well-studied, but the specific issue of missing labels during model evaluation has been overlooked. Ignoring samples with missing values, a common solution, can introduce bias, especially when data is…

Machine Learning · Computer Science 2025-04-28 Danial Dervovic , Michael Cashmore

Semi-Supervised Learning with Multiple Imputations on Non-Random Missing Labels

Semi-Supervised Learning (SSL) is implemented when algorithms are trained on both labeled and unlabeled data. This is a very common application of ML as it is unrealistic to obtain a fully labeled dataset. Researchers have tackled three…

Machine Learning · Computer Science 2023-08-16 Jason Lu , Michael Ma , Huaze Xu , Zixi Xu

Missing Features Reconstruction and Its Impact on Classification Accuracy

In real-world applications, we can encounter situations when a well-trained model has to be used to predict from a damaged dataset. The damage caused by missing or corrupted values can be either on the level of individual instances or on…

Machine Learning · Computer Science 2019-11-12 Magda Friedjungová , Daniel Vašata , Marcel Jiřina

Estimation and imputation in Probabilistic Principal Component Analysis with Missing Not At Random data

Missing Not At Random (MNAR) values lead to significant biases in the data, since the probability of missingness depends on the unobserved values.They are ''not ignorable'' in the sense that they often require defining a model for the…

Statistics Theory · Mathematics 2020-06-11 Aude Sportisse , Claire Boyer , Julie Josse

Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

Data imputation, the process of filling in missing feature elements for incomplete data sets, plays a crucial role in data-driven learning. A fundamental belief is that data imputation is helpful for learning performance, and it follows…

Machine Learning · Computer Science 2025-09-30 Ruikai Yang , Fan He , Mingzhen He , Kaijie Wang , Xiaolin Huang

Identifiable Deep Latent Variable Models for MNAR Data

Missing data is a ubiquitous challenge in data analysis, often leading to biased and inaccurate results. Traditional imputation methods usually assume that the missingness mechanism is missing-at-random (MAR), where the missingness is…

Methodology · Statistics 2026-03-30 Huiming Xie , Fei Xue , Xiao Wang

A Unified Framework for Inference with General Missingness Patterns and Machine Learning Imputation

Pre-trained machine learning (ML) predictions have been increasingly used to complement incomplete data to enable downstream scientific inquiries, but their naive integration risks biased inferences. Recently, multiple methods have been…

Methodology · Statistics 2025-11-12 Xingran Chen , Tyler McCormick , Bhramar Mukherjee , Zhenke Wu

Classification with Strategically Withheld Data

Machine learning techniques can be useful in applications such as credit approval and college admission. However, to be classified more favorably in such contexts, an agent may decide to strategically withhold some of her features, such as…

Machine Learning · Computer Science 2021-01-15 Anilesh K. Krishnaswamy , Haoming Li , David Rein , Hanrui Zhang , Vincent Conitzer

Missing Data in Signal Processing and Machine Learning: Models, Methods and Modern Approaches

This tutorial aims to provide signal processing (SP) and machine learning (ML) practitioners with vital tools, in an accessible way, to answer the question: How to deal with missing data? There are many strategies to handle incomplete…

Signal Processing · Electrical Eng. & Systems 2026-01-06 Alexandre Hippert-Ferrer , Aude Sportisse , Amirhossein Javaheri , Mohammed Nabil El Korso , Daniel P. Palomar

Probabilistic Imputation for Time-series Classification with Missing Data

Multivariate time series data for real-world applications typically contain a significant amount of missing values. The dominant approach for classification with such missing values is to impute them heuristically with specific values…

Machine Learning · Computer Science 2023-08-15 SeungHyun Kim , Hyunsu Kim , EungGu Yun , Hwangrae Lee , Jaehun Lee , Juho Lee

Propensity score estimation using classification and regression trees in the presence of missing covariate data

Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the…

Machine Learning · Statistics 2018-07-26 Bas B. L. Penning de Vries , Maarten van Smeden , Rolf H. H. Groenwold

On missing label patterns in semi-supervised learning

We investigate model based classification with partially labelled training data. In many biostatistical applications, labels are manually assigned by experts, who may leave some observations unlabelled due to class uncertainty. We analyse…

Methodology · Statistics 2019-04-08 Daniel Ahfock , Geoffrey J. McLachlan

Missing Value Imputation With Unsupervised Backpropagation

Many data mining and data analysis techniques operate on dense matrices or complete tables of data. Real-world data sets, however, often contain unknown values. Even many classification algorithms that are designed to operate with missing…

Neural and Evolutionary Computing · Computer Science 2013-12-20 Michael S. Gashler , Michael R. Smith , Richard Morris , Tony Martinez