English
Related papers

Related papers: A primer on linear classification with missing dat…

200 papers

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…

Machine Learning · Statistics 2020-08-11 Jason Poulos , Rafael Valle

Predicting with missing inputs challenges even parametric models, as parameter estimation alone is insufficient for prediction on incomplete data. While several works study prediction in linear models, we focus on logistic models, where…

Machine Learning · Statistics 2026-02-03 Christophe Muller , Erwan Scornet , Julie Josse

When training predictive models on data with missing entries, the most widely used and versatile approach is a pipeline technique where we first impute missing entries and then compute predictions. In this paper, we view prediction with…

Machine Learning · Computer Science 2025-02-25 Dimitris Bertsimas , Arthur Delarue , Jean Pauphilet

Tensor classification is gaining importance across fields, yet handling partially observed data remains challenging. In this paper, we introduce a novel approach to tensor classification with incomplete data, framed within high-dimensional…

Machine Learning · Statistics 2024-11-01 Elynn Chen , Yuefeng Han , Jiayu Li

Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data. Yet, this simple method could be expected to induce a large bias for prediction purposes, as the imputed input…

Statistics Theory · Mathematics 2024-02-07 Alexis Ayme , Claire Boyer , Aymeric Dieuleveut , Erwan Scornet

As the adoption of Artificial Intelligence (AI) models expands into critical real-world applications, ensuring the explainability of these models becomes paramount, particularly in sensitive fields such as medicine and finance. Linear…

Machine Learning · Computer Science 2024-10-10 Tuan L. Vo , Uyen Dang , Thu Nguyen

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here,…

Machine Learning · Statistics 2024-03-22 Julie Josse , Jacob M. Chen , Nicolas Prost , Erwan Scornet , Gaël Varoquaux

Missing data in supervised learning is well-studied, but the specific issue of missing labels during model evaluation has been overlooked. Ignoring samples with missing values, a common solution, can introduce bias, especially when data is…

Machine Learning · Computer Science 2025-04-28 Danial Dervovic , Michael Cashmore

Semi-Supervised Learning (SSL) is implemented when algorithms are trained on both labeled and unlabeled data. This is a very common application of ML as it is unrealistic to obtain a fully labeled dataset. Researchers have tackled three…

Machine Learning · Computer Science 2023-08-16 Jason Lu , Michael Ma , Huaze Xu , Zixi Xu

In real-world applications, we can encounter situations when a well-trained model has to be used to predict from a damaged dataset. The damage caused by missing or corrupted values can be either on the level of individual instances or on…

Machine Learning · Computer Science 2019-11-12 Magda Friedjungová , Daniel Vašata , Marcel Jiřina

Missing Not At Random (MNAR) values lead to significant biases in the data, since the probability of missingness depends on the unobserved values.They are ''not ignorable'' in the sense that they often require defining a model for the…

Statistics Theory · Mathematics 2020-06-11 Aude Sportisse , Claire Boyer , Julie Josse

Data imputation, the process of filling in missing feature elements for incomplete data sets, plays a crucial role in data-driven learning. A fundamental belief is that data imputation is helpful for learning performance, and it follows…

Machine Learning · Computer Science 2025-09-30 Ruikai Yang , Fan He , Mingzhen He , Kaijie Wang , Xiaolin Huang

Missing data is a ubiquitous challenge in data analysis, often leading to biased and inaccurate results. Traditional imputation methods usually assume that the missingness mechanism is missing-at-random (MAR), where the missingness is…

Methodology · Statistics 2026-03-30 Huiming Xie , Fei Xue , Xiao Wang

Pre-trained machine learning (ML) predictions have been increasingly used to complement incomplete data to enable downstream scientific inquiries, but their naive integration risks biased inferences. Recently, multiple methods have been…

Methodology · Statistics 2025-11-12 Xingran Chen , Tyler McCormick , Bhramar Mukherjee , Zhenke Wu

Machine learning techniques can be useful in applications such as credit approval and college admission. However, to be classified more favorably in such contexts, an agent may decide to strategically withhold some of her features, such as…

Machine Learning · Computer Science 2021-01-15 Anilesh K. Krishnaswamy , Haoming Li , David Rein , Hanrui Zhang , Vincent Conitzer

This tutorial aims to provide signal processing (SP) and machine learning (ML) practitioners with vital tools, in an accessible way, to answer the question: How to deal with missing data? There are many strategies to handle incomplete…

Signal Processing · Electrical Eng. & Systems 2026-01-06 Alexandre Hippert-Ferrer , Aude Sportisse , Amirhossein Javaheri , Mohammed Nabil El Korso , Daniel P. Palomar

Multivariate time series data for real-world applications typically contain a significant amount of missing values. The dominant approach for classification with such missing values is to impute them heuristically with specific values…

Machine Learning · Computer Science 2023-08-15 SeungHyun Kim , Hyunsu Kim , EungGu Yun , Hwangrae Lee , Jaehun Lee , Juho Lee

Data mining and machine learning techniques such as classification and regression trees (CART) represent a promising alternative to conventional logistic regression for propensity score estimation. Whereas incomplete data preclude the…

Machine Learning · Statistics 2018-07-26 Bas B. L. Penning de Vries , Maarten van Smeden , Rolf H. H. Groenwold

We investigate model based classification with partially labelled training data. In many biostatistical applications, labels are manually assigned by experts, who may leave some observations unlabelled due to class uncertainty. We analyse…

Methodology · Statistics 2019-04-08 Daniel Ahfock , Geoffrey J. McLachlan

Many data mining and data analysis techniques operate on dense matrices or complete tables of data. Real-world data sets, however, often contain unknown values. Even many classification algorithms that are designed to operate with missing…

Neural and Evolutionary Computing · Computer Science 2013-12-20 Michael S. Gashler , Michael R. Smith , Richard Morris , Tony Martinez
‹ Prev 1 2 3 10 Next ›