Related papers: Estimation beyond Missing (Completely) at Random

High-dimensional estimation with missing data: Statistical and computational limits

We consider computationally-efficient estimation of population parameters when observations are subject to missing data. In particular, we consider estimation under the realizable contamination model of missing data in which an $\epsilon$…

Statistics Theory · Mathematics 2026-03-18 Kabir Aladin Verchand , Ankit Pensia , Saminul Haque , Rohith Kuditipudi

High-Dimensional Gaussian Mean Estimation under Realizable Contamination

We study mean estimation for a Gaussian distribution with identity covariance in $\mathbb{R}^d$ under a missing data scheme termed realizable $\epsilon$-contamination model. In this model an adversary can choose a function $r(x)$ between 0…

Machine Learning · Computer Science 2026-03-18 Ilias Diakonikolas , Daniel M. Kane , Thanasis Pittas

Estimating Average Causal Effects with Incomplete Exposure and Confounders

Standard methods for estimating average causal effects require complete observations of the exposure and confounders. In observational studies, however, missing data are ubiquitous. Motivated by a study on the effect of prescription opioids…

Methodology · Statistics 2025-06-30 Lan Wen , Glen McGee

Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility

Given a set of incomplete observations, we study the nonparametric problem of testing whether data are Missing Completely At Random (MCAR). Our first contribution is to characterise precisely the set of alternatives that can be…

Statistics Theory · Mathematics 2022-05-19 Thomas B Berrett , Richard J Samworth

Parametric MMD Estimation with Missing Values: Robustness to Missingness and Data Model Misspecification

In the missing data literature, the Maximum Likelihood Estimator (MLE) is celebrated for its ignorability property under missing at random (MAR) data. However, its sensitivity to misspecification of the (complete) data model, even under…

Methodology · Statistics 2025-09-23 Badr-Eddine Chérief-Abdellatif , Jeffrey Näf

Estimation and imputation in Probabilistic Principal Component Analysis with Missing Not At Random data

Missing Not At Random (MNAR) values lead to significant biases in the data, since the probability of missingness depends on the unobserved values.They are ''not ignorable'' in the sense that they often require defining a model for the…

Statistics Theory · Mathematics 2020-06-11 Aude Sportisse , Claire Boyer , Julie Josse

Sufficient Identification Conditions and Semiparametric Estimation under Missing Not at Random Mechanisms

Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data, where the missingness mechanism is dependent on the missing values themselves even conditioned on the observed data. Here, we…

Methodology · Statistics 2023-06-13 Anna Guo , Jiwei Zhao , Razieh Nabi

Estimating Treatment Effects with Missings Not At Random in the Estimand Framework using Causal Inference

The analysis of randomized trials is often complicated by the occurrence of intercurrent events and missing values. Even though there are different strategies to address missing values it is still common to require missing values…

Methodology · Statistics 2025-11-11 A. Ruiz de Villa , Ll. Badiella

Prediction with Missing Data: Target Probabilities and Missingness Mechanisms

Conditions ensuring optimal parameter estimation in the presence of missing data are well established in inference, typically relying on the Missing-at-Random (MAR) assumption. In prediction, similar principles are often assumed to apply.…

Methodology · Statistics 2026-03-19 Pierre Catoire , Robin Genuer , Cecile Proust-Lima

Minimax rate of consistency for linear models with missing values

Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...). In fact, the very nature of missing values usually…

Machine Learning · Statistics 2022-02-04 Alexis Ayme , Claire Boyer , Aymeric Dieuleveut , Erwan Scornet

Random features models: a way to study the success of naive imputation

Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data. Yet, this simple method could be expected to induce a large bias for prediction purposes, as the imputed input…

Statistics Theory · Mathematics 2024-02-07 Alexis Ayme , Claire Boyer , Aymeric Dieuleveut , Erwan Scornet

Asymptotics of Nonparametric Estimation under general non-monotone MAR missingness: A Bayesian Approach

Missing values are ubiquitous in (data) science, with potential detrimental consequences for any statistical analysis. As a consequence, a wealth of methods and theoretical results have been developed in recent years. Still, many questions…

Statistics Theory · Mathematics 2026-03-25 Badr-Eddine Chérief-Abdellatif , Jeffrey Näf

Semiparametric Estimation with Data Missing Not at Random Using an Instrumental Variable

Missing data occur frequently in empirical studies in health and social sciences, often compromising our ability to make accurate inferences. An outcome is said to be missing not at random (MNAR) if, conditional on the observed variables,…

Methodology · Statistics 2019-01-23 BaoLuo Sun , Lan Liu , Wang Miao , Kathleen Wirth , James Robins , Eric Tchetgen Tchetgen

Developing robust methods to handle missing data in real-world applications effectively

Missing data is a pervasive challenge spanning diverse data types, including tabular, sensor data, time-series, images and so on. Its origins are multifaceted, resulting in various missing mechanisms. Prior research in this field has…

Machine Learning · Computer Science 2025-03-03 Youran Zhou , Mohamed Reda Bouadjenek , Sunil Aryal

Missing at Random or Not: A Semiparametric Testing Approach

Practical problems with missing data are common, and statistical methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism…

Methodology · Statistics 2020-03-26 Rui Duan , C. Jason Liang , Pamela Shaw , Cheng Yong Tang , Yong Chen

Empirical likelihood for estimating equations with missing values

We consider an empirical likelihood inference for parameters defined by general estimating equations when some components of the random observations are subject to missingness. As the nature of the estimating equations is wide-ranging, we…

Statistics Theory · Mathematics 2009-03-05 Dong Wang , Song Xi Chen

Identifiable Deep Latent Variable Models for MNAR Data

Missing data is a ubiquitous challenge in data analysis, often leading to biased and inaccurate results. Traditional imputation methods usually assume that the missingness mechanism is missing-at-random (MAR), where the missingness is…

Methodology · Statistics 2026-03-30 Huiming Xie , Fei Xue , Xiao Wang

Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption

Matrix completion is often applied to data with entries missing not at random (MNAR). For example, consider a recommendation system where users tend to only reveal ratings for items they like. In this case, a matrix completion method that…

Machine Learning · Statistics 2019-10-30 Wei Ma , George H. Chen

Semiparametric Inference of the Complier Average Causal Effect with Nonignorable Missing Outcomes

Noncompliance and missing data often occur in randomized trials, which complicate the inference of causal effects. When both noncompliance and missing data are present, previous papers proposed moment and maximum likelihood estimators for…

Methodology · Statistics 2014-09-04 Hua Chen , Peng Ding , Zhi Geng , Xiao-Hua Zhou

A Unified Framework for Inference with General Missingness Patterns and Machine Learning Imputation

Pre-trained machine learning (ML) predictions have been increasingly used to complement incomplete data to enable downstream scientific inquiries, but their naive integration risks biased inferences. Recently, multiple methods have been…

Methodology · Statistics 2025-11-12 Xingran Chen , Tyler McCormick , Bhramar Mukherjee , Zhenke Wu