Related papers: Missing Pattern Recognized Diffusion Imputation Mo…

Identifiable Deep Latent Variable Models for MNAR Data

Missing data is a ubiquitous challenge in data analysis, often leading to biased and inaccurate results. Traditional imputation methods usually assume that the missingness mechanism is missing-at-random (MAR), where the missingness is…

Methodology · Statistics 2026-03-30 Huiming Xie , Fei Xue , Xiao Wang

MissDDIM: Deterministic and Efficient Conditional Diffusion for Tabular Data Imputation

Diffusion models have recently emerged as powerful tools for missing data imputation by modeling the joint distribution of observed and unobserved variables. However, existing methods, typically based on stochastic denoising diffusion…

Artificial Intelligence · Computer Science 2025-08-06 Youran Zhou , Mohamed Reda Bouadjenek , Sunil Aryal

Deep Generative Imputation Model for Missing Not At Random Data

Data analysis usually suffers from the Missing Not At Random (MNAR) problem, where the cause of the value missing is not fully observed. Compared to the naive Missing Completely At Random (MCAR) problem, it is more in line with the…

Machine Learning · Computer Science 2025-05-27 Jialei Chen , Yuanbo Xu , Pengyang Wang , Yongjian Yang

Identifiable Generative Models for Missing Not at Random Data Imputation

Real-world datasets often have missing values associated with complex generative processes, where the cause of the missingness may not be fully observed. This is known as missing not at random (MNAR) data. However, many imputation methods…

Machine Learning · Computer Science 2021-10-29 Chao Ma , Cheng Zhang

Causal View of Time Series Imputation: Some Identification Results on Missing Mechanism

Time series imputation is one of the most challenge problems and has broad applications in various fields like health care and the Internet of Things. Existing methods mainly aim to model the temporally latent dependencies and the…

Machine Learning · Computer Science 2025-05-13 Ruichu Cai , Kaitao Zheng , Junxian Huang , Zijian Li , Zhengming Chen , Boyan Xu , Zhifeng Hao

Sufficient Identification Conditions and Semiparametric Estimation under Missing Not at Random Mechanisms

Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data, where the missingness mechanism is dependent on the missing values themselves even conditioned on the observed data. Here, we…

Methodology · Statistics 2023-06-13 Anna Guo , Jiwei Zhao , Razieh Nabi

Estimation and imputation in Probabilistic Principal Component Analysis with Missing Not At Random data

Missing Not At Random (MNAR) values lead to significant biases in the data, since the probability of missingness depends on the unobserved values.They are ''not ignorable'' in the sense that they often require defining a model for the…

Statistics Theory · Mathematics 2020-06-11 Aude Sportisse , Claire Boyer , Julie Josse

RefiDiff: Progressive Refinement Diffusion for Efficient Missing Data Imputation

Missing values in high-dimensional, mixed-type datasets pose significant challenges for data imputation, particularly under Missing Not At Random (MNAR) mechanisms. Existing methods struggle to integrate local and global data…

Machine Learning · Computer Science 2025-11-13 Md Atik Ahamed , Qiang Ye , Qiang Cheng

Latent Space Score-based Diffusion Model for Probabilistic Multivariate Time Series Imputation

Accurate imputation is essential for the reliability and success of downstream tasks. Recently, diffusion models have attracted great attention in this field. However, these models neglect the latent distribution in a lower-dimensional…

Machine Learning · Computer Science 2024-09-16 Guojun Liang , Najmeh Abiri , Atiye Sadat Hashemi , Jens Lundström , Stefan Byttner , Prayag Tiwari

Prediction with Missing Data: Target Probabilities and Missingness Mechanisms

Conditions ensuring optimal parameter estimation in the presence of missing data are well established in inference, typically relying on the Missing-at-Random (MAR) assumption. In prediction, similar principles are often assumed to apply.…

Methodology · Statistics 2026-03-19 Pierre Catoire , Robin Genuer , Cecile Proust-Lima

Identification, Doubly Robust Estimation, and Semiparametric Efficiency Theory of Nonignorable Missing Data With a Shadow Variable

We consider identification and estimation with an outcome missing not at random (MNAR). We study an identification strategy based on a so-called shadow variable. A shadow variable is assumed to be correlated with the outcome, but…

Methodology · Statistics 2019-09-10 Wang Miao , Lan Liu , Eric Tchetgen Tchetgen , Zhi Geng

Imputing With Predictive Mean Matching Can Be Severely Biased When Values Are Missing At Random

Predictive mean matching (PMM) is a popular imputation strategy that imputes missing values by borrowing observed values from other cases with similar expectations. We show that, unlike other imputation strategies, PMM is not guaranteed to…

Methodology · Statistics 2025-07-01 Paul T. von Hippel

Unsupervised representation learning with recognition-parametrised probabilistic models

We introduce a new approach to probabilistic unsupervised learning based on the recognition-parametrised model (RPM): a normalised semi-parametric hypothesis class for joint distributions over observed and latent variables. Under the key…

Machine Learning · Computer Science 2023-04-21 William I. Walker , Hugo Soulat , Changmin Yu , Maneesh Sahani

Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach

We consider the task of identifying and estimating a parameter of interest in settings where data is missing not at random (MNAR). In general, such parameters are not identified without strong assumptions on the missing data model. In this…

Methodology · Statistics 2024-02-29 Zixiao Wang , AmirEmad Ghassami , Ilya Shpitser

DiffPuter: Empowering Diffusion Models for Missing Data Imputation

Generative models play an important role in missing data imputation in that they aim to learn the joint distribution of full data. However, applying advanced deep generative models (such as Diffusion models) to missing data imputation is…

Machine Learning · Computer Science 2025-05-27 Hengrui Zhang , Liancheng Fang , Qitian Wu , Philip S. Yu

Recursive Equations For Imputation Of Missing Not At Random Data With Sparse Pattern Support

A common approach for handling missing values in data analysis pipelines is multiple imputation via software packages such as MICE (Van Buuren and Groothuis-Oudshoorn, 2011) and Amelia (Honaker et al., 2011). These packages typically assume…

Methodology · Statistics 2025-07-23 Trung Phung , Kyle Reese , Ilya Shpitser , Rohit Bhattacharya

AMM-Diff: Adaptive Multi-Modality Diffusion Network for Missing Modality Imputation

In clinical practice, full imaging is not always feasible, often due to complex acquisition protocols, stringent privacy regulations, or specific clinical needs. However, missing MR modalities pose significant challenges for tasks like…

Computer Vision and Pattern Recognition · Computer Science 2025-01-23 Aghiles Kebaili , Jérôme Lapuyade-Lahorgue , Pierre Vera , Su Ruan

Model-based Clustering with Missing Not At Random Data

Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based…

Machine Learning · Statistics 2023-12-25 Aude Sportisse , Matthieu Marbac , Fabien Laporte , Gilles Celeux , Claire Boyer , Julie Josse , Christophe Biernacki

Multiple imputation with missing data indicators

Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation (SRMI), also called chained equations…

Methodology · Statistics 2021-03-04 Lauren J Beesley , Irina Bondarenko , Michael R Elliott , Allison W Kurian , Steven J Katz , Jeremy M G Taylor

Missing Data and Prediction

Missing data are a common problem for both the construction and implementation of a prediction algorithm. Pattern mixture kernel submodels (PMKS) - a series of submodels for every missing data pattern that are fit using only data from that…

Methodology · Statistics 2017-04-27 Sarah Fletcher Mercaldo , Jeffrey D. Blume