English
Related papers

Related papers: Sharing pattern submodels for prediction with miss…

200 papers

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here,…

Machine Learning · Statistics 2024-03-22 Julie Josse , Jacob M. Chen , Nicolas Prost , Erwan Scornet , Gaël Varoquaux

Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...). In fact, the very nature of missing values usually…

Machine Learning · Statistics 2022-02-04 Alexis Ayme , Claire Boyer , Aymeric Dieuleveut , Erwan Scornet

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the…

Machine Learning · Computer Science 2020-10-21 Anant Raj , Cameron Musco , Lester Mackey , Nicolo Fusi

Conditions ensuring optimal parameter estimation in the presence of missing data are well established in inference, typically relying on the Missing-at-Random (MAR) assumption. In prediction, similar principles are often assumed to apply.…

Methodology · Statistics 2026-03-19 Pierre Catoire , Robin Genuer , Cecile Proust-Lima

In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of…

Machine Learning · Computer Science 2019-03-27 Magda Gregorova

Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those…

Machine Learning · Computer Science 2020-12-22 Naman Goel , Alfonso Amayuelas , Amit Deshpande , Amit Sharma

We investigate model based classification with partially labelled training data. In many biostatistical applications, labels are manually assigned by experts, who may leave some observations unlabelled due to class uncertainty. We analyse…

Methodology · Statistics 2019-04-08 Daniel Ahfock , Geoffrey J. McLachlan

Machine learning (ML) has become a ubiquitous tool across various domains of data mining and big data analysis. The efficacy of ML models depends heavily on high-quality datasets, which are often complicated by the presence of missing…

Machine Learning · Computer Science 2024-10-14 Abu Fuad Ahmad , Md Shohel Sayeed , Khaznah Alshammari , Istiaque Ahmed

As one of the most commonly seen data challenges, missing data, in particular, multiple, non-monotone missing patterns, complicates estimation and inference due to the fact that missingness mechanisms are often not missing at random, and…

Methodology · Statistics 2025-04-21 Jianing Dong , Raymond K. W. Wong , Kwun Chuen Gary Chan

Handling missing values at test time is challenging for machine learning models, especially when aiming for both high accuracy and interpretability. Established approaches often add bias through imputation or excessive model complexity via…

Machine Learning · Computer Science 2025-05-07 Lena Stempfle , Anton Matsson , Newton Mwai , Fredrik D. Johansson

Missing values in datasets are common in applied statistics. For regression problems, theoretical work thus far has largely considered the issue of missing covariates as distinct from missing responses. However, in practice, many datasets…

Statistics Theory · Mathematics 2026-02-17 Benedict M. Risebrow , Thomas B. Berrett

This paper develops an inferential theory for high-dimensional matrix-variate factor models with missing observations. We propose an easy-to-use all-purpose method that involves two straightforward steps. First, we perform principal…

Methodology · Statistics 2025-03-26 Yongxia Zhang , Jinwen Liang , Liwen Xu , Keming Yu , Maozai Tian

We consider the problem of parameter estimation using weakly supervised datasets, where a training sample consists of the input and a partially specified annotation, which we refer to as the output. The missing information in the annotation…

Machine Learning · Computer Science 2012-06-22 M. Pawan Kumar , Ben Packer , Daphne Koller

We identify a phenomenon, which we refer to as multi-model forgetting, that occurs when sequentially training multiple deep networks with partially-shared parameters; the performance of previously-trained models degrades as one optimizes a…

Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative…

Machine Learning · Computer Science 2019-02-28 Ramiro D. Camino , Christian A. Hammerschmidt , Radu State

Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where…

Machine Learning · Computer Science 2012-02-28 Ali Jalali , Pradeep Ravikumar , Sujay Sanghavi

Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing…

Machine Learning · Statistics 2026-02-10 Enze Shi , Pankaj Bhagwat , Zhixian Yang , Linglong Kong , Bei Jiang

Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may…

Artificial Intelligence · Computer Science 2017-06-06 Yuyi Wang , Jan Ramon , Zheng-Chu Guo

While discriminative classifiers often yield strong predictive performance, missing feature values at prediction time can still be a challenge. Classifiers may not behave as expected under certain ways of substituting the missing values,…

Machine Learning · Computer Science 2019-06-04 Pasha Khosravi , Yitao Liang , YooJung Choi , Guy Van den Broeck

A variety of machine learning tasks---e.g., matrix factorization, topic modelling, and feature allocation---can be viewed as learning the parameters of a probability distribution over bipartite graphs. Recently, a new class of models for…

Machine Learning · Statistics 2017-12-07 Victor Veitch , Ekansh Sharma , Zacharie Naulet , Daniel M. Roy
‹ Prev 1 2 3 10 Next ›