Related papers: Adapting tree-based multiple imputation methods fo…

Evaluating tree-based imputation methods as an alternative to MICE PMM for drawing inference in empirical studies

Dealing with missing data is an important problem in statistical analysis that is often addressed with imputation procedures. The performance and validity of such methods are of great importance for their application in empirical studies.…

Applications · Statistics 2024-01-19 Jakob Schwerter , Ketevan Gurtskaia , Andrés Romero , Birgit Zeyer-Gliozzo , Markus Pauly

Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework

Missing values in tabular data restrict the use and performance of machine learning, requiring the imputation of missing values. The most popular imputation algorithm is arguably multiple imputations using chains of equations (MICE), which…

Machine Learning · Computer Science 2022-03-01 Manar D Samad , Sakib Abrar , Norou Diawara

Are deep learning models superior for missing data imputation in large surveys? Evidence from an empirical comparison

Multiple imputation (MI) is a popular approach for dealing with missing data arising from non-response in sample surveys. Multiple imputation by chained equations (MICE) is one of the most widely used MI algorithms for multivariate data,…

Machine Learning · Computer Science 2022-03-22 Zhenhua Wang , Olanrewaju Akande , Jason Poulos , Fan Li

Multiple Imputation Through XGBoost

The use of multiple imputation (MI) is becoming increasingly popular for addressing missing data. Although some conventional MI approaches have been well studied and have shown empirical validity, they have limitations when processing large…

Methodology · Statistics 2023-07-31 Yongshi Deng , Thomas Lumley

Who wins the Miss Contest for Imputation Methods? Our Vote for Miss BooPF

Missing data is an expected issue when large amounts of data is collected, and several imputation techniques have been proposed to tackle this problem. Beneath classical approaches such as MICE, the application of Machine Learning…

Machine Learning · Statistics 2017-12-01 Burim Ramosaj , Markus Pauly

Multiple Imputation Methods under Extreme Values

Missing data are ubiquitous in empirical databases, yet statistical analyses typically require complete data matrices. Multiple imputation offers a principled solution for filling these gaps. This study evaluates the performance of several…

Computation · Statistics 2026-02-05 Enzo Porto Brasil

Interpretable Prediction Rule Ensembles in the Presence of Missing Data

Prediction Rule Ensembles (PREs) are robust and interpretable statistical learning techniques with potential for predictive analytics, yet their efficacy in the presence of missing data is untested. This study uses multiple imputation to…

Applications · Statistics 2024-10-22 Vincent Schroeder , Jakob Schwerter , Marjolein Fokkema , Philipp Doebler

In-Database Data Imputation

Missing data is a widespread problem in many domains, creating challenges in data analysis and decision making. Traditional techniques for dealing with missing data, such as excluding incomplete records or imputing simple estimates (e.g.,…

Databases · Computer Science 2024-01-09 Massimo Perini , Milos Nikolic

A Comparative Study of Imputation Methods for Multivariate Ordinal Data

Missing data remains a very common problem in large datasets, including survey and census data containing many ordinal responses, such as political polls and opinion surveys. Multiple imputation (MI) is usually the go-to approach for…

Methodology · Statistics 2024-12-25 Chayut Wongkamthong , Olanrewaju Akande

MissForest - nonparametric missing value imputation for mixed-type data

Modern data acquisition based on high-throughput technology is often facing the problem of missing data. Algorithms commonly used in the analysis of such large-scale data often depend on a complete set. Missing value imputation offers a…

Applications · Statistics 2014-06-03 Daniel J. Stekhoven , Peter Bühlmann

Tree Boosting Methods for Balanced andImbalanced Classification and their Robustness Over Time in Risk Assessment

Most real-world classification problems deal with imbalanced datasets, posing a challenge for Artificial Intelligence (AI), i.e., machine learning algorithms, because the minority class, which is of extreme interest, often proves difficult…

Machine Learning · Computer Science 2025-04-28 Gissel Velarde , Michael Weichert , Anuj Deshmunkh , Sanjay Deshmane , Anindya Sudhir , Khushboo Sharma , Vaibhav Joshi

A cautionary tale on using imputation methods for inference in matched pairs design

Imputation procedures in biomedical fields have turned into statistical practice, since further analyses can be conducted ignoring the former presence of missing values. In particular, non-parametric imputation schemes like the random…

Applications · Statistics 2018-08-13 Burim Ramosaj , Lubna Amro , Markus Pauly

MIBoost: A gradient boosting algorithm for variable selection after multiple imputation

Statistical learning methods for automated variable selection, such as the Least Absolute Shrinkage and Selection Operator (LASSO), elastic nets, and gradient boosting, have become increasingly popular tools for building powerful prediction…

Machine Learning · Statistics 2026-04-13 Robert Kuchen

A stacked approach for chained equations multiple imputation incorporating the substantive model

Multiple imputation by chained equations (MICE) has emerged as a popular approach for handling missing data. A central challenge for applying MICE is determining how to incorporate outcome information into covariate imputation models,…

Methodology · Statistics 2019-10-11 Lauren Beesley , Jeremy M G Taylor

Multivariate Boosted Trees and Applications to Forecasting and Control

Gradient boosted trees are competition-winning, general-purpose, non-parametric regressors, which exploit sequential model fitting and gradient descent to minimize a specific loss function. The most popular implementations are tailored to…

Machine Learning · Computer Science 2022-08-23 Lorenzo Nespoli , Vasco Medici

A Robust Missing Value Imputation Method MifImpute For Incomplete Molecular Descriptor Data And Comparative Analysis With Other Missing Value Imputation Methods

Missing data imputation is an important research topic in data mining. Large-scale Molecular descriptor data may contains missing values (MVs). However, some methods for downstream analyses, including some prediction tools, require a…

Computational Engineering, Finance, and Science · Computer Science 2013-12-13 Doreswamy , Chanabasayya . M. Vastrad

Which Imputation Fits Which Feature Selection Method? A Survey-Based Simulation Study

Tree-based learning methods such as Random Forest and XGBoost are still the gold-standard prediction methods for tabular data. Feature importance measures are usually considered for feature selection as well as to assess the effect of…

Applications · Statistics 2024-12-19 Jakob Schwerter , Andrés Romero , Florian Dumpert , Markus Pauly

A computational study on imputation methods for missing environmental data

Data acquisition and recording in the form of databases are routine operations. The process of collecting data, however, may experience irregularities, resulting in databases with missing data. Missing entries might alter analysis…

Databases · Computer Science 2021-08-24 Paul Dixneuf , Fausto Errico , Mathias Glaus

Missing Pattern Tree based Decision Grouping and Ensemble for Enhancing Pair Utilization in Deep Incomplete Multi-View Clustering

Real-world multi-view data often exhibit highly inconsistent missing patterns, posing significant challenges for incomplete multi-view clustering (IMVC). Although existing IMVC methods have made progress from both imputation-based and…

Machine Learning · Computer Science 2026-04-21 Jie Xu , Wenyuan Yang , Yazhou Ren , Lifang He , Philip S. Yu , Xiaofeng Zhu

tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data

Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing…

Machine Learning · Statistics 2026-04-10 Amuche Ibenegbu , Pierre Lafaye de Micheaux , Rohitash Chandra