Related papers: NeuMiss networks: differentiable programming for s…

What's a good imputation to predict with missing values?

How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical…

Machine Learning · Statistics 2021-12-01 Marine Le Morvan , Julie Josse , Erwan Scornet , Gaël Varoquaux

Minimax rate of consistency for linear models with missing values

Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...). In fact, the very nature of missing values usually…

Machine Learning · Statistics 2022-02-04 Alexis Ayme , Claire Boyer , Aymeric Dieuleveut , Erwan Scornet

Prediction with Missing Data: Target Probabilities and Missingness Mechanisms

Conditions ensuring optimal parameter estimation in the presence of missing data are well established in inference, typically relying on the Missing-at-Random (MAR) assumption. In prediction, similar principles are often assumed to apply.…

Methodology · Statistics 2026-03-19 Pierre Catoire , Robin Genuer , Cecile Proust-Lima

Missing Data and Prediction

Missing data are a common problem for both the construction and implementation of a prediction algorithm. Pattern mixture kernel submodels (PMKS) - a series of submodels for every missing data pattern that are fit using only data from that…

Methodology · Statistics 2017-04-27 Sarah Fletcher Mercaldo , Jeffrey D. Blume

A Unifying Framework for Robust and Efficient Inference with Unstructured Data

To analyze unstructured data (text, images, audio, video), economists typically first extract low-dimensional structured features with a neural network. Neural networks do not make generically unbiased predictions, and biases will propagate…

Econometrics · Economics 2026-02-20 Jacob Carlson , Melissa Dell

Identifiable Deep Latent Variable Models for MNAR Data

Missing data is a ubiquitous challenge in data analysis, often leading to biased and inaccurate results. Traditional imputation methods usually assume that the missingness mechanism is missing-at-random (MAR), where the missingness is…

Methodology · Statistics 2026-03-30 Huiming Xie , Fei Xue , Xiao Wang

The AI&M Procedure for Learning from Incomplete Data

We investigate methods for parameter learning from incomplete data that is not missing at random. Likelihood-based methods then require the optimization of a profile likelihood that takes all possible missingness mechanisms into account.…

Methodology · Statistics 2012-07-02 Manfred Jaeger

Linear predictor on linearly-generated data with missing values: non consistency and solutions

We consider building predictors when the data have missing values. We study the seemingly-simple case where the target to predict is a linear function of the fully-observed data and we show that, in the presence of missing values, the…

Machine Learning · Computer Science 2020-07-02 Marine Le Morvan , Nicolas Prost , Julie Josse , Erwan Scornet , Gaël Varoquaux

Sufficient Identification Conditions and Semiparametric Estimation under Missing Not at Random Mechanisms

Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data, where the missingness mechanism is dependent on the missing values themselves even conditioned on the observed data. Here, we…

Methodology · Statistics 2023-06-13 Anna Guo , Jiwei Zhao , Razieh Nabi

Missing Pattern Recognized Diffusion Imputation Model for Missing Not At Random

Missing data frequently arises across diverse domains, including time-series and image domains. In the real world, missing occurrences often depend on the unobservable values themselves, which are referred to as Missing Not at Random…

Machine Learning · Computer Science 2026-05-26 Gyuwon Sim , Sumin Lee , Heesun Bae , Byeonghu Na , Doyun Kwon , Ju-Hee Hwang , Jae-Young Lim , Il-Chul Moon

Sharing pattern submodels for prediction with missing values

Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time. When variables are missing in recurring patterns, fitting separate pattern submodels have been proposed as…

Machine Learning · Computer Science 2023-11-27 Lena Stempfle , Ashkan Panahi , Fredrik D. Johansson

Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption

Matrix completion is often applied to data with entries missing not at random (MNAR). For example, consider a recommendation system where users tend to only reveal ratings for items they like. In this case, a matrix completion method that…

Machine Learning · Statistics 2019-10-30 Wei Ma , George H. Chen

Rethinking GNNs and Missing Features: Challenges, Evaluation and a Robust Solution

Handling missing node features is a key challenge for deploying Graph Neural Networks (GNNs) in real-world domains such as healthcare and sensor networks. Existing studies mostly address relatively benign scenarios, namely benchmark…

Machine Learning · Computer Science 2026-05-19 Francesco Ferrini , Veronica Lachi , Antonio Longa , Bruno Lepri , Matono Akiyoshi , Andrea Passerini , Xin Liu , Manfred Jaeger

Prediction with Incomplete Data under Agnostic Mask Distribution Shift

Data with missing values is ubiquitous in many applications. Recent years have witnessed increasing attention on prediction with only incomplete data consisting of observed features and a mask that indicates the missing pattern. Existing…

Machine Learning · Computer Science 2023-05-22 Yichen Zhu , Jian Yuan , Bo Jiang , Tao Lin , Haiming Jin , Xinbing Wang , Chenghu Zhou

Parametric MMD Estimation with Missing Values: Robustness to Missingness and Data Model Misspecification

In the missing data literature, the Maximum Likelihood Estimator (MLE) is celebrated for its ignorability property under missing at random (MAR) data. However, its sensitivity to misspecification of the (complete) data model, even under…

Methodology · Statistics 2025-09-23 Badr-Eddine Chérief-Abdellatif , Jeffrey Näf

On the consistency of supervised learning with missing values

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here,…

Machine Learning · Statistics 2024-03-22 Julie Josse , Jacob M. Chen , Nicolas Prost , Erwan Scornet , Gaël Varoquaux

Neural Parameter Estimation with Incomplete Data

Advances in artificial intelligence (AI) and deep learning have led to neural networks being used to generate lightning-speed answers to complex science questions, paintings in the style of Monet, or stories like those of Twain. Leveraging…

Methodology · Statistics 2026-02-25 Matthew Sainsbury-Dale , Andrew Zammit-Mangion , Noel Cressie , Raphaël Huser

Missing Data: A Comparison of Neural Network and Expectation Maximisation Techniques

The estimation of missing input vector elements in real time processing applications requires a system that possesses the knowledge of certain characteristics such as correlations between variables, which are inherent in the input space.…

Applications · Statistics 2007-05-23 Fulufhelo V. Nelwamondo , Shakir Mohamed , Tshilidzi Marwala

PROMISSING: Pruning Missing Values in Neural Networks

While data are the primary fuel for machine learning models, they often suffer from missing values, especially when collected in real-world scenarios. However, many off-the-shelf machine learning models, including artificial neural network…

Machine Learning · Computer Science 2022-06-06 Seyed Mostafa Kia , Nastaran Mohammadian Rad , Daniel van Opstal , Bart van Schie , Andre F. Marquand , Josien Pluim , Wiepke Cahn , Hugo G. Schnack

Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data

We propose an efficient family of algorithms to learn the parameters of a Bayesian network from incomplete data. In contrast to textbook approaches such as EM and the gradient method, our approach is non-iterative, yields closed form…

Machine Learning · Computer Science 2014-11-26 Guy Van den Broeck , Karthika Mohan , Arthur Choi , Judea Pearl