Related papers: Integrating multi-source block-wise missing data i…

Imputation-Powered Inference

Modern multi-modal and multi-site data frequently suffer from blockwise missingness, where subsets of features are missing for groups of individuals, creating complex patterns that challenge standard inference methods. Existing approaches…

Methodology · Statistics 2025-09-18 Sarah Zhao , Emmanuel Candès

Imputation of missing data using multivariate Gaussian Linear Cluster-Weighted Modeling

Missing data arises when certain values are not recorded or observed for variables of interest. However, most of the statistical theory assume complete data availability. To address incomplete databases, one approach is to fill the gaps…

Methodology · Statistics 2023-08-15 Luis Alejandro Masmela-Caita , Thais Paiva Galletti , Marcos Oliveira Prates

Accounting for model uncertainty in multiple imputation under complex sampling

Multiple imputation provides an effective way to handle missing data. When several possible models are under consideration for the data, the multiple imputation is typically performed under a single-best model selected from the candidate…

Methodology · Statistics 2018-11-30 Gyuhyeong Goh , Jae Kwang Kim

The Bias and Efficiency of Incomplete-Data Estimators in Small Univariate Normal Samples

Widely used methods for analyzing missing data can be biased in small samples. To understand these biases, we evaluate in detail the situation where a small univariate normal sample, with values missing at random, is analyzed using either…

Statistics Theory · Mathematics 2017-03-27 Paul T. von Hippel

Meta-Imputation Balanced (MIB): An Ensemble Approach for Handling Missing Data in Biomedical Machine Learning

Missing data represents a fundamental challenge in machine learning applications, often reducing model performance and reliability. This problem is particularly acute in fields like bioinformatics and clinical machine learning, where…

Machine Learning · Computer Science 2025-09-04 Fatemeh Azad , Zoran Bosnić , Matjaž Kukar

Multiple imputation for multilevel data with continuous and binary variables

We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive…

Methodology · Statistics 2026-05-18 Vincent Audigier , Ian R. White , Shahab Jolani , Thomas P. A. Debray , Matteo Quartagno , James Carpenter , Stef van Buuren , Matthieu Resche-Rigon

Robust Simulation-Based Inference under Missing Data via Neural Processes

Simulation-based inference (SBI) methods typically require fully observed data to infer parameters of models with intractable likelihood functions. However, datasets often contain missing values due to incomplete observations, data…

Machine Learning · Computer Science 2025-03-04 Yogesh Verma , Ayush Bharti , Vikas Garg

Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Missing data theory deals with the statistical methods in the occurrence of missing data. Missing data occurs when some values are not stored or observed for variables of interest. However, most of the statistical theory assumes that data…

Methodology · Statistics 2021-10-26 Luis Alejandro Masmela-Caita , Thais Paiva Galletti , Marcos Oliveira Prates

Combining multiple imputation with raking of weights: An efficient and robust approach in the setting of nearly-true models

Multiple imputation provides us with efficient estimators in model-based methods for handling missing data under the true model. It is also well-understood that design-based estimators are robust methods that do not require accurately…

Methodology · Statistics 2020-06-11 Kyunghee Han , Pamela A. Shaw , Thomas Lumley

An ensemble learning method for variable selection: application to high dimensional data and missing values

Standard approaches for variable selection in linear models are not tailored to deal properly with high-dimensional and incomplete data. Currently, methods dedicated to high-dimensional data handle missing values by ad-hoc strategies, like…

Methodology · Statistics 2021-06-09 Avner Bar-Hen , Vincent Audigier

Optimized Linear Imputation

Often in real-world datasets, especially in high dimensional data, some feature values are missing. Since most data analysis and statistical methods do not handle gracefully missing values, the first step in the analysis requires the…

Machine Learning · Statistics 2016-12-08 Yehezkel S. Resheff , Daphna Weinshall

Imputation and Missing Indicators for handling missing data in the development and implementation of clinical prediction models: a simulation study

Background: Existing guidelines for handling missing data are generally not consistent with the goals of prediction modelling, where missing data can occur at any stage of the model pipeline. Multiple imputation (MI), often heralded as the…

Methodology · Statistics 2022-06-27 Rose Sisk , Matthew Sperrin , Niels Peek , Maarten van Smeden , Glen P. Martin

Multiple imputation with missing data indicators

Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation (SRMI), also called chained equations…

Methodology · Statistics 2021-03-04 Lauren J Beesley , Irina Bondarenko , Michael R Elliott , Allison W Kurian , Steven J Katz , Jeremy M G Taylor

Variational Bayesian Multiple Imputation in High-Dimensional Regression Models With Missing Responses

Multiple imputation has become one of the standard methods in drawing inferences in many incomplete data applications. Applications of multiple imputation in relatively more complex settings, such as high-dimensional clustered data, require…

Methodology · Statistics 2025-04-08 Qiushuang Li , Recai Yucel

A Copula-based Imputation Model for Missing Data of Mixed Type in Multilevel Data Sets

We propose a copula based method to handle missing values in multivariate data of mixed types in multilevel data sets. Building upon the extended rank likelihood of \cite{hoff2007extending} and the multinomial probit model, our model is a…

Methodology · Statistics 2017-02-28 Jiali Wang , Bronwyn Loong , Anton H. Westveld , Alan H. Welsh

A comparison of strategies for selecting auxiliary variables for multiple imputation

Multiple imputation (MI) is a popular method for handling missing data. Auxiliary variables can be added to the imputation model(s) to improve MI estimates. However, the choice of which auxiliary variables to include in the imputation model…

Methodology · Statistics 2022-04-01 Rheanna M. Mainzer , Cattram D. Nguyen , John B. Carlin , Margarita Moreno-Betancur , Ian R. White , Katherine J. Lee

A Comparative Study of Imputation Methods for Multivariate Ordinal Data

Missing data remains a very common problem in large datasets, including survey and census data containing many ordinal responses, such as political polls and opinion surveys. Multiple imputation (MI) is usually the go-to approach for…

Methodology · Statistics 2024-12-25 Chayut Wongkamthong , Olanrewaju Akande

Multiple Imputation Methods under Extreme Values

Missing data are ubiquitous in empirical databases, yet statistical analyses typically require complete data matrices. Multiple imputation offers a principled solution for filling these gaps. This study evaluates the performance of several…

Computation · Statistics 2026-02-05 Enzo Porto Brasil

A comparison of multiple imputation methods for bivariate hierarchical outcomes

Missing observations are common in cluster randomised trials. Approaches taken to handling such missing data include: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed…

Methodology · Statistics 2014-07-18 Karla Diaz-Ordaz , Michael G. Kenward , Manuel Gomes , Richard Grieve

HMVI: Unifying Heterogeneous Attributes with Natural Neighbors for Missing Value Inference

Missing value imputation is a fundamental challenge in machine intelligence, heavily dependent on data completeness. Current imputation methods often handle numerical and categorical attributes independently, overlooking critical…

Machine Learning · Computer Science 2026-01-09 Xiaopeng Luo , Zexi Tan , Zhuowei Wang