Related papers: Evaluating the Impact of Missing Data Imputation t…

Estimation of Missing Data Using Computational Intelligence and Decision Trees

This paper introduces a novel paradigm to impute missing data that combines a decision tree with an auto-associative neural network (AANN) based model and a principal component analysis-neural network (PCA-NN) based model. For each model,…

Applications · Statistics 2007-09-12 George Ssali , Tshilidzi Marwala

Random Forest Missing Data Algorithms

Random forest (RF) missing data algorithms are an attractive approach for dealing with missing data. They have the desirable properties of being able to handle mixed types of missing data, they are adaptive to interactions and nonlinearity,…

Machine Learning · Statistics 2017-01-23 Fei Tang , Hemant Ishwaran

MissForest - nonparametric missing value imputation for mixed-type data

Modern data acquisition based on high-throughput technology is often facing the problem of missing data. Algorithms commonly used in the analysis of such large-scale data often depend on a complete set. Missing value imputation offers a…

Applications · Statistics 2014-06-03 Daniel J. Stekhoven , Peter Bühlmann

Who wins the Miss Contest for Imputation Methods? Our Vote for Miss BooPF

Missing data is an expected issue when large amounts of data is collected, and several imputation techniques have been proposed to tackle this problem. Beneath classical approaches such as MICE, the application of Machine Learning…

Machine Learning · Statistics 2017-12-01 Burim Ramosaj , Markus Pauly

Missing Data using Decision Forest and Computational Intelligence

Autoencoder neural network is implemented to estimate the missing data. Genetic algorithm is implemented for network optimization and estimating the missing data. Missing data is treated as Missing At Random mechanism by implementing…

Machine Learning · Statistics 2008-12-10 D. Moon , T. Marwala

missForestPredict -- Missing data imputation for prediction settings

Prediction models are used to predict an outcome based on input variables. Missing data in input variables often occurs at model development and at prediction time. The missForestPredict R package proposes an adaptation of the missForest…

Methodology · Statistics 2024-07-08 Elena Albu , Shan Gao , Laure Wynants , Ben Van Calster

Which Imputation Fits Which Feature Selection Method? A Survey-Based Simulation Study

Tree-based learning methods such as Random Forest and XGBoost are still the gold-standard prediction methods for tabular data. Feature importance measures are usually considered for feature selection as well as to assess the effect of…

Applications · Statistics 2024-12-19 Jakob Schwerter , Andrés Romero , Florian Dumpert , Markus Pauly

Comparison of Data Imputation Techniques and their Impact

Missing and incomplete information in surveys or databases can be imputed using different statistical and soft-computing techniques. This paper comprehensively compares auto-associative neural networks (NN), neuro-fuzzy (NF) systems and the…

Methodology · Statistics 2008-12-09 Darren Blend , Tshilidzi Marwala

Influence of parallel computing strategies of iterative imputation of missing data: a case study on missForest

Machine learning iterative imputation methods have been well accepted by researchers for imputing missing data, but they can be time-consuming when handling large datasets. To overcome this drawback, parallel computing strategies have been…

Applications · Statistics 2020-04-24 Shangzhi Hong , Yuqi Sun , Hanying Li , Henry S. Lynn

Label-Guided Imputation via Forest-Based Proximities for Improved Time Series Classification

Missing data is a common problem in time series data. Most methods for imputation ignore label information pertaining to the time series even if that information exists. In this paper, we provide a framework for missing data imputation in…

Machine Learning · Statistics 2025-09-30 Jake S. Rhodes , Adam G. Rustad , Sofia Pelagalli Maia , Evan Thacker , Hyunmi Choi , Jose Gutierrez , Tatjana Rundek , Ben Shaw

A Robust Missing Value Imputation Method MifImpute For Incomplete Molecular Descriptor Data And Comparative Analysis With Other Missing Value Imputation Methods

Missing data imputation is an important research topic in data mining. Large-scale Molecular descriptor data may contains missing values (MVs). However, some methods for downstream analyses, including some prediction tools, require a…

Computational Engineering, Finance, and Science · Computer Science 2013-12-13 Doreswamy , Chanabasayya . M. Vastrad

Confidence Intervals for Random Forest Permutation Importance with Missing Data

Random Forests are renowned for their predictive accuracy, but valid inference, particularly about permutation-based feature importances, remains challenging. Existing methods, such as the confidence intervals (CIs) from Ishwaran et al.…

Methodology · Statistics 2025-07-21 Nico Föge , Markus Pauly

On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets

Missing values or data is one popular characteristic of real-world datasets, especially healthcare data. This could be frustrating when using machine learning algorithms on such datasets, simply because most machine learning models perform…

Machine Learning · Computer Science 2024-03-25 Luke Oluwaseye Joel , Wesley Doorsamy , Babu Sena Paul

Generative Imputation and Stochastic Prediction

In many machine learning applications, we are faced with incomplete datasets. In the literature, missing data imputation techniques have been mostly concerned with filling missing values. However, the existence of missing values is…

Machine Learning · Computer Science 2020-09-07 Mohammad Kachuee , Kimmo Karkkainen , Orpaz Goldstein , Sajad Darabi , Majid Sarrafzadeh

Estimating Viral Genetic Linkage Rates in the Presence of Missing Data

Although the interest in the the use of social and information networks has grown, most inferences on networks assume the data collected represents the complete. However, when ignoring missing data, even when missing completely at random,…

Methodology · Statistics 2022-03-25 Tyler Vu , Tuo Lin , Jingjing Zou , Vladimir Novitsky , Xin Tu , Victor De Gruttola

Missing Data Imputation for Supervised Learning

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks.…

Machine Learning · Statistics 2020-08-11 Jason Poulos , Rafael Valle

Evaluation of data imputation strategies in complex, deeply-phenotyped data sets: the case of the EU-AIMS Longitudinal European Autism Project

An increasing number of large-scale multi-modal research initiatives has been conducted in the typically developing population, as well as in psychiatric cohorts. Missing data is a common problem in such datasets due to the difficulty of…

Machine Learning · Computer Science 2022-01-25 A. Llera , M. Brammer , B. Oakley , J. Tillmann , M. Zabihi , T. Mei , T. Charman , C. Ecker , F. Dell Acqua , T. Banaschewski , C. Moessnang , S. Baron-Cohen , R. Holt , S. Durston , D. Murphy , E. Loth , J. K. Buitelaar , D. L. Floris , C. F. Beckmann

Imputation using training labels and classification via label imputation

Missing data is a common problem in practical data science settings. Various imputation methods have been developed to deal with missing data. However, even though the labels are available in the training data in many situations, the common…

Machine Learning · Computer Science 2025-01-30 Thu Nguyen , Tuan L. Vo , Pål Halvorsen , Michael A. Riegler

A comparison of multiple imputation methods for bivariate hierarchical outcomes

Missing observations are common in cluster randomised trials. Approaches taken to handling such missing data include: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed…

Methodology · Statistics 2014-07-18 Karla Diaz-Ordaz , Michael G. Kenward , Manuel Gomes , Richard Grieve

A computational study on imputation methods for missing environmental data

Data acquisition and recording in the form of databases are routine operations. The process of collecting data, however, may experience irregularities, resulting in databases with missing data. Missing entries might alter analysis…

Databases · Computer Science 2021-08-24 Paul Dixneuf , Fausto Errico , Mathias Glaus