Related papers: Sharing pattern submodels for prediction with miss…

On the consistency of supervised learning with missing values

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here,…

Machine Learning · Statistics 2024-03-22 Julie Josse , Jacob M. Chen , Nicolas Prost , Erwan Scornet , Gaël Varoquaux

Minimax rate of consistency for linear models with missing values

Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...). In fact, the very nature of missing values usually…

Machine Learning · Statistics 2022-02-04 Alexis Ayme , Claire Boyer , Aymeric Dieuleveut , Erwan Scornet

Model-specific Data Subsampling with Influence Functions

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the…

Machine Learning · Computer Science 2020-10-21 Anant Raj , Cameron Musco , Lester Mackey , Nicolo Fusi

Prediction with Missing Data: Target Probabilities and Missingness Mechanisms

Conditions ensuring optimal parameter estimation in the presence of missing data are well established in inference, typically relying on the Missing-at-Random (MAR) assumption. In prediction, similar principles are often assumed to apply.…

Methodology · Statistics 2026-03-19 Pierre Catoire , Robin Genuer , Cecile Proust-Lima

Sparse Learning for Variable Selection with Structures and Nonlinearities

In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of…

Machine Learning · Computer Science 2019-03-27 Magda Gregorova

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those…

Machine Learning · Computer Science 2020-12-22 Naman Goel , Alfonso Amayuelas , Amit Deshpande , Amit Sharma

On missing label patterns in semi-supervised learning

We investigate model based classification with partially labelled training data. In many biostatistical applications, labels are manually assigned by experts, who may leave some observations unlabelled due to class uncertainty. We analyse…

Methodology · Statistics 2019-04-08 Daniel Ahfock , Geoffrey J. McLachlan

Impact of Missing Values in Machine Learning: A Comprehensive Analysis

Machine learning (ML) has become a ubiquitous tool across various domains of data mining and big data analysis. The efficacy of ML models depends heavily on high-quality datasets, which are often complicated by the presence of missing…

Machine Learning · Computer Science 2024-10-14 Abu Fuad Ahmad , Md Shohel Sayeed , Khaznah Alshammari , Istiaque Ahmed

Efficient Estimation under Multiple Missing Patterns via Balancing Weights

As one of the most commonly seen data challenges, missing data, in particular, multiple, non-monotone missing patterns, complicates estimation and inference due to the fact that missingness mechanisms are often not missing at random, and…

Methodology · Statistics 2025-04-21 Jianing Dong , Raymond K. W. Wong , Kwun Chuen Gary Chan

Prediction Models That Learn to Avoid Missing Values

Handling missing values at test time is challenging for machine learning models, especially when aiming for both high accuracy and interpretability. Established approaches often add bias through imputation or excessive model complexity via…

Machine Learning · Computer Science 2025-05-07 Lena Stempfle , Anton Matsson , Newton Mwai , Fredrik D. Johansson

Semi-supervised linear regression with missing covariates

Missing values in datasets are common in applied statistics. For regression problems, theoretical work thus far has largely considered the issue of missing covariates as distinct from missing responses. However, in practice, many datasets…

Statistics Theory · Mathematics 2026-02-17 Benedict M. Risebrow , Thomas B. Berrett

Statistical Inference for High-dimensional Matrix-variate Factor Models with Missing Observations

This paper develops an inferential theory for high-dimensional matrix-variate factor models with missing observations. We propose an easy-to-use all-purpose method that involves two straightforward steps. First, we perform principal…

Methodology · Statistics 2025-03-26 Yongxia Zhang , Jinwen Liang , Liwen Xu , Keming Yu , Maozai Tian

Modeling Latent Variable Uncertainty for Loss-based Learning

We consider the problem of parameter estimation using weakly supervised datasets, where a training sample consists of the input and a partially specified annotation, which we refer to as the output. The missing information in the annotation…

Machine Learning · Computer Science 2012-06-22 M. Pawan Kumar , Ben Packer , Daphne Koller

Overcoming Multi-Model Forgetting

We identify a phenomenon, which we refer to as multi-model forgetting, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a…

Machine Learning · Computer Science 2019-03-05 Yassine Benyahia , Kaicheng Yu , Kamil Bennani-Smires , Martin Jaggi , Anthony Davison , Mathieu Salzmann , Claudiu Musat

Improving Missing Data Imputation with Deep Generative Models

Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative…

Machine Learning · Computer Science 2019-02-28 Ramiro D. Camino , Christian A. Hammerschmidt , Radu State

A Dirty Model for Multiple Sparse Regression

Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where…

Machine Learning · Computer Science 2012-02-28 Ali Jalali , Pradeep Ravikumar , Sujay Sanghavi

Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis

Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing…

Machine Learning · Statistics 2026-02-10 Enze Shi , Pankaj Bhagwat , Zhixian Yang , Linglong Kong , Bei Jiang

Learning from networked examples

Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may…

Artificial Intelligence · Computer Science 2017-06-06 Yuyi Wang , Jan Ramon , Zheng-Chu Guo

What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features

While discriminative classifiers often yield strong predictive performance, missing feature values at prediction time can still be a challenge. Classifiers may not behave as expected under certain ways of substituting the missing values,…

Machine Learning · Computer Science 2019-06-04 Pasha Khosravi , Yitao Liang , YooJung Choi , Guy Van den Broeck

Exchangeable modelling of relational data: checking sparsity, train-test splitting, and sparse exchangeable Poisson matrix factorization

A variety of machine learning tasks---e.g., matrix factorization, topic modelling, and feature allocation---can be viewed as learning the parameters of a probability distribution over bipartite graphs. Recently, a new class of models for…

Machine Learning · Statistics 2017-12-07 Victor Veitch , Ekansh Sharma , Zacharie Naulet , Daniel M. Roy