Related papers: Missing Data Imputation for Classification Problem…

Missing Data Imputation for Supervised Learning

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…

Machine Learning · Statistics 2020-08-11 Jason Poulos , Rafael Valle

Nearest Neighbor Imputation for Categorical Data by Weighting of Attributes

Missing values are a common phenomenon in all areas of applied research. While various imputation methods are available for metrically scaled variables, methods for categorical data are scarce. An imputation method that has been shown to…

Methodology · Statistics 2017-10-04 Shahla Faisal , Gerhard Tutz

Imputation of Missing Data with Class Imbalance using Conditional Generative Adversarial Networks

Missing data is a common problem faced with real-world datasets. Imputation is a widely used technique to estimate the missing data. State-of-the-art imputation approaches, such as Generative Adversarial Imputation Nets (GAIN), model the…

Machine Learning · Computer Science 2020-12-02 Saqib Ejaz Awan , Mohammed Bennamoun , Ferdous Sohel , Frank M Sanfilippo , Girish Dwivedi

A computational study on imputation methods for missing environmental data

Data acquisition and recording in the form of databases are routine operations. The process of collecting data, however, may experience irregularities, resulting in databases with missing data. Missing entries might alter analysis…

Databases · Computer Science 2021-08-24 Paul Dixneuf , Fausto Errico , Mathias Glaus

Handling Missing Data in Downstream Tasks With Distribution-Preserving Guarantees

Missing feature values are a significant hurdle for downstream machine-learning tasks such as classification. However, imputation methods for classification might be time-consuming for high-dimensional data, and offer few theoretical…

Machine Learning · Computer Science 2025-05-15 Rahul Bordoloi , Clémence Réda , Saptarshi Bej , Olaf Wolkenhauer

Label-Guided Imputation via Forest-Based Proximities for Improved Time Series Classification

Missing data is a common problem in time series data. Most methods for imputation ignore label information pertaining to the time series even if that information exists. In this paper, we provide a framework for missing data imputation in…

Machine Learning · Statistics 2025-09-30 Jake S. Rhodes , Adam G. Rustad , Sofia Pelagalli Maia , Evan Thacker , Hyunmi Choi , Jose Gutierrez , Tatjana Rundek , Ben Shaw

Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

Data imputation, the process of filling in missing feature elements for incomplete data sets, plays a crucial role in data-driven learning. A fundamental belief is that data imputation is helpful for learning performance, and it follows…

Machine Learning · Computer Science 2025-09-30 Ruikai Yang , Fan He , Mingzhen He , Kaijie Wang , Xiaolin Huang

Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach

Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy…

Machine Learning · Statistics 2023-07-11 Florian Lalande , Kenji Doya

Dimensional Data KNN-Based Imputation

Data Warehouses (DWs) are core components of Business Intelligence (BI). Missing data in DWs have a great impact on data analyses. Therefore, missing data need to be completed. Unlike other existing data imputation methods mainly adapted…

Databases · Computer Science 2022-10-06 Yuzhao Yang , Jérôme Darmont , Franck Ravat , Olivier Teste

Handling Missing Data with Graph Representation Learning

Machine learning with missing data has been approached in two different ways, including feature imputation where missing feature values are estimated based on observed values, and label prediction where downstream labels are learned…

Machine Learning · Computer Science 2020-11-02 Jiaxuan You , Xiaobai Ma , Daisy Yi Ding , Mykel Kochenderfer , Jure Leskovec

Missing Data Imputation by Reducing Mutual Information with Rectified Flows

This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and the corresponding missingness mask. Inspired by GAN-based approaches that train generators to…

Machine Learning · Statistics 2025-11-26 Jiahao Yu , Qizhen Ying , Leyang Wang , Ziyue Jiang , Song Liu

An approach to dealing with missing values in heterogeneous data using k-nearest neighbors

Techniques such as clusterization, neural networks and decision making usually rely on algorithms that are not well suited to deal with missing values. However, real world data frequently contains such cases. The simplest solution is to…

Machine Learning · Computer Science 2016-08-16 Davi E. N. Frossard , Igor O. Nunes , Renato A. Krohling

Missing Data Imputation using Optimal Transport

Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage…

Machine Learning · Statistics 2020-07-02 Boris Muzellec , Julie Josse , Claire Boyer , Marco Cuturi

Generative Imputation and Stochastic Prediction

In many machine learning applications, we are faced with incomplete datasets. In the literature, missing data imputation techniques have been mostly concerned with filling missing values. However, the existence of missing values is…

Machine Learning · Computer Science 2020-09-07 Mohammad Kachuee , Kimmo Karkkainen , Orpaz Goldstein , Sajad Darabi , Majid Sarrafzadeh

Impact Of Missing Data Imputation On The Fairness And Accuracy Of Graph Node Classifiers

Analysis of the fairness of machine learning (ML) algorithms recently attracted many researchers' interest. Most ML methods show bias toward protected groups, which limits the applicability of ML models in many applications like crime rate…

Machine Learning · Computer Science 2022-11-03 Haris Mansoor , Sarwan Ali , Shafiq Alam , Muhammad Asad Khan , Umair ul Hassan , Imdadullah Khan

Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data

Missing data are ubiquitous in real world applications and, if not adequately handled, may lead to the loss of information and biased findings in downstream analysis. Particularly, high-dimensional incomplete data with a moderate sample…

Machine Learning · Computer Science 2022-12-23 Zongyu Dai , Zhiqi Bu , Qi Long

A Comparison of Full Information Maximum Likelihood and Machine Learning Missing Data Analytical Methods in Growth Curve Modeling

Missing data are inevitable in longitudinal studies. Traditional methods, such as the full information maximum likelihood (FIML), are commonly used to handle ignorable missing data. However, they may lead to biased model estimation due to…

Applications · Statistics 2024-01-01 Dandan Tang , Xin Tong

An Interdisciplinary and Cross-Task Review on Missing Data Imputation

Missing data is a fundamental challenge in data science, significantly hindering analysis and decision-making across a wide range of disciplines, including healthcare, bioinformatics, social science, e-commerce, and industrial monitoring.…

Machine Learning · Statistics 2026-05-12 Jicong Fan

Controllable Missingness from Uncontrollable Missingness: Joint Learning Measurement Policy and Imputation

Due to the cost or interference of measurement, we need to control measurement system. Assuming that each variable can be measured sequentially, there exists optimal policy choosing next measurement for the former observations. Though…

Machine Learning · Computer Science 2022-04-11 Seongwook Yoon , Jaehyun Kim , Heejeong Lim , Sanghoon Sull

Imputation using training labels and classification via label imputation

Missing data is a common problem in practical data science settings. Various imputation methods have been developed to deal with missing data. However, even though the labels are available in the training data in many situations, the common…

Machine Learning · Computer Science 2025-01-30 Thu Nguyen , Tuan L. Vo , Pål Halvorsen , Michael A. Riegler