Related papers: Missing Data and Prediction

Prediction with Missing Data: Target Probabilities and Missingness Mechanisms

Conditions ensuring optimal parameter estimation in the presence of missing data are well established in inference, typically relying on the Missing-at-Random (MAR) assumption. In prediction, similar principles are often assumed to apply.…

Methodology · Statistics 2026-03-19 Pierre Catoire , Robin Genuer , Cecile Proust-Lima

An efficient multiple imputation algorithm for control-based and delta-adjusted pattern mixture models using SAS

In clinical trials, mixed effects models for repeated measures (MMRM) and pattern mixture models (PMM) are often used to analyze longitudinal continuous outcomes. We describe a simple missing data imputation algorithm for the MMRM that can…

Methodology · Statistics 2016-10-13 Yongqiang Tang

Imputation and Missing Indicators for handling missing data in the development and implementation of clinical prediction models: a simulation study

Background: Existing guidelines for handling missing data are generally not consistent with the goals of prediction modelling, where missing data can occur at any stage of the model pipeline. Multiple imputation (MI), often heralded as the…

Methodology · Statistics 2022-06-27 Rose Sisk , Matthew Sperrin , Niels Peek , Maarten van Smeden , Glen P. Martin

When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values

Predicting with missing inputs challenges even parametric models, as parameter estimation alone is insufficient for prediction on incomplete data. While several works study prediction in linear models, we focus on logistic models, where…

Machine Learning · Statistics 2026-02-03 Christophe Muller , Erwan Scornet , Julie Josse

NeuMiss networks: differentiable programming for supervised learning with missing values

The presence of missing values makes supervised learning much more challenging. Indeed, previous work has shown that even when the response is a linear function of the complete data, the optimal predictor is a complex function of the…

Machine Learning · Computer Science 2020-11-05 Marine Le Morvan , Julie Josse , Thomas Moreau , Erwan Scornet , Gaël Varoquaux

Missingness-MDPs: Bridging the Theory of Missing Data and POMDPs

We introduce missingness-MDPs (miss-MDPs), a novel subclass of partially observable Markov decision processes (POMDPs) that incorporates the theory of missing data. A miss-MDP is a POMDP whose observation function is a missingness function,…

Artificial Intelligence · Computer Science 2026-05-13 Joshua Wendland , Markel Zubia , Roman Andriushchenko , Maris F. L. Galesloot , Milan Ceska , Henrik von Kleist , Thiago D. Simao , Maximilian Weininger , Nils Jansen

A Unified Framework for Inference with General Missingness Patterns and Machine Learning Imputation

Pre-trained machine learning (ML) predictions have been increasingly used to complement incomplete data to enable downstream scientific inquiries, but their naive integration risks biased inferences. Recently, multiple methods have been…

Methodology · Statistics 2025-11-12 Xingran Chen , Tyler McCormick , Bhramar Mukherjee , Zhenke Wu

Imputing With Predictive Mean Matching Can Be Severely Biased When Values Are Missing At Random

Predictive mean matching (PMM) is a popular imputation strategy that imputes missing values by borrowing observed values from other cases with similar expectations. We show that, unlike other imputation strategies, PMM is not guaranteed to…

Methodology · Statistics 2025-07-01 Paul T. von Hippel

Recursive Equations For Imputation Of Missing Not At Random Data With Sparse Pattern Support

A common approach for handling missing values in data analysis pipelines is multiple imputation via software packages such as MICE (Van Buuren and Groothuis-Oudshoorn, 2011) and Amelia (Honaker et al., 2011). These packages typically assume…

Methodology · Statistics 2025-07-23 Trung Phung , Kyle Reese , Ilya Shpitser , Rohit Bhattacharya

HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation

Handling incomplete and heterogeneous data remains a central challenge in real-world machine learning, where missing values may follow complex mechanisms (MCAR, MAR, MNAR) and features can be of mixed types (numerical and categorical).…

Machine Learning · Computer Science 2025-07-30 Youran Zhou , Mohamed Reda Bouadjenek , Jonathan Wells , Sunil Aryal

Incorporating Missing Data Considerations into Sample Size Calculations for Developing Clinical Prediction Models

Clinical prediction models must be developed using sufficiently large datasets to minimise overfitting and ensure robust predictive performance. Existing sample size calculations assume complete predictor data for all included participants,…

Methodology · Statistics 2026-05-11 Glen P. Martin , Sian Bladon , Rebecca Whittle , Molly Wells , Gary S. Collins , Richard D. Riley

Interpretable Prediction Rule Ensembles in the Presence of Missing Data

Prediction Rule Ensembles (PREs) are robust and interpretable statistical learning techniques with potential for predictive analytics, yet their efficacy in the presence of missing data is untested. This study uses multiple imputation to…

Applications · Statistics 2024-10-22 Vincent Schroeder , Jakob Schwerter , Marjolein Fokkema , Philipp Doebler

Missing Pattern Recognized Diffusion Imputation Model for Missing Not At Random

Missing data frequently arises across diverse domains, including time-series and image domains. In the real world, missing occurrences often depend on the unobservable values themselves, which are referred to as Missing Not at Random…

Machine Learning · Computer Science 2026-05-26 Gyuwon Sim , Sumin Lee , Heesun Bae , Byeonghu Na , Doyun Kwon , Ju-Hee Hwang , Jae-Young Lim , Il-Chul Moon

A Comparison of Full Information Maximum Likelihood and Machine Learning Missing Data Analytical Methods in Growth Curve Modeling

Missing data are inevitable in longitudinal studies. Traditional methods, such as the full information maximum likelihood (FIML), are commonly used to handle ignorable missing data. However, they may lead to biased model estimation due to…

Applications · Statistics 2024-01-01 Dandan Tang , Xin Tong

Parametric MMD Estimation with Missing Values: Robustness to Missingness and Data Model Misspecification

In the missing data literature, the Maximum Likelihood Estimator (MLE) is celebrated for its ignorability property under missing at random (MAR) data. However, its sensitivity to misspecification of the (complete) data model, even under…

Methodology · Statistics 2025-09-23 Badr-Eddine Chérief-Abdellatif , Jeffrey Näf

DPER: Efficient Parameter Estimation for Randomly Missing Data

The missing data problem has been broadly studied in the last few decades and has various applications in different areas such as statistics or bioinformatics. Even though many methods have been developed to tackle this challenge, most of…

Machine Learning · Statistics 2021-06-10 Thu Nguyen , Khoi Minh Nguyen-Duy , Duy Ho Minh Nguyen , Binh T. Nguyen , Bruce Alan Wade

Statistical Inference with Different Missing-data Mechanisms

When data are missing due to at most one cause from some time to next time, we can make sampling distribution inferences about the parameter of the data by modeling the missing-data mechanism correctly. Proverbially, in case its mechanism…

Methodology · Statistics 2014-07-21 Kosuke Morikawa , Yutaka Kano

Handling missing data in a neural network approach for the identification of charged particles in a multilayer detector

Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for…

Methodology · Statistics 2020-04-14 S. Riggi , D. Riggi , F. Riggi

Model-based Clustering with Missing Not At Random Data

Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based…

Machine Learning · Statistics 2023-12-25 Aude Sportisse , Matthieu Marbac , Fabien Laporte , Gilles Celeux , Claire Boyer , Julie Josse , Christophe Biernacki

Robust Optimal Designs when Missing Data Happen at Random

In this article, we investigate the robust optimal design problem for the prediction of response when the fitted regression models are only approximately specified, and observations might be missing completely at random. The intuitive idea…

Methodology · Statistics 2022-10-19 Rui Hu , Ion Bica , Zhichun Zhai