English
Related papers

Related papers: High-dimensional estimation with missing data: Sta…

200 papers

We study mean estimation for a Gaussian distribution with identity covariance in $\mathbb{R}^d$ under a missing data scheme termed realizable $\epsilon$-contamination model. In this model an adversary can choose a function $r(x)$ between 0…

Machine Learning · Computer Science 2026-03-18 Ilias Diakonikolas , Daniel M. Kane , Thanasis Pittas

We study the effects of missingness on the estimation of population parameters. Moving beyond restrictive missing completely at random (MCAR) assumptions, we first formulate a missing data analogue of Huber's arbitrary…

Statistics Theory · Mathematics 2026-04-28 Tianyi Ma , Kabir A. Verchand , Thomas B. Berrett , Tengyao Wang , Richard J. Samworth

Although a majority of the theoretical literature in high-dimensional statistics has focused on settings which involve fully-observed data, settings with missing values and corruptions are common in practice. We consider the problems of…

Machine Learning · Statistics 2017-11-06 Yining Wang , Jialei Wang , Sivaraman Balakrishnan , Aarti Singh

Data analysis usually suffers from the Missing Not At Random (MNAR) problem, where the cause of the value missing is not fully observed. Compared to the naive Missing Completely At Random (MCAR) problem, it is more in line with the…

Machine Learning · Computer Science 2025-05-27 Jialei Chen , Yuanbo Xu , Pengyang Wang , Yongjian Yang

We study the fundamental problems of Gaussian mean estimation and linear regression with Gaussian covariates in the presence of Huber contamination. Our main contribution is the design of the first sample near-optimal and almost linear-time…

Data Structures and Algorithms · Computer Science 2023-12-05 Ilias Diakonikolas , Daniel M. Kane , Ankit Pensia , Thanasis Pittas

Missing data is a ubiquitous challenge in data analysis, often leading to biased and inaccurate results. Traditional imputation methods usually assume that the missingness mechanism is missing-at-random (MAR), where the missingness is…

Methodology · Statistics 2026-03-30 Huiming Xie , Fei Xue , Xiao Wang

Missing data arise in most applied settings and are ubiquitous in electronic health records (EHR). When data are missing not at random (MNAR) with respect to measured covariates, sensitivity analyses are often considered. These post-hoc…

Methodology · Statistics 2023-07-11 Alexander W. Levis , Rajarshi Mukherjee , Rui Wang , Heidi Fischer , Sebastien Haneuse

This paper reviews recent advances in missing data research using graphical models to represent multivariate dependencies. We first examine the limitations of traditional frameworks from three different perspectives: \textit{transparency,…

Methodology · Statistics 2019-11-15 Karthika Mohan , Judea Pearl

Missing data are frequently encountered in high-dimensional problems, but they are usually difficult to deal with using standard algorithms, such as the expectation-maximization (EM) algorithm and its variants. To tackle this difficulty,…

Methodology · Statistics 2018-02-08 Faming Liang , Bochao Jia , Jingnan Xue , Qizhai Li , Ye Luo

We propose an l1-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at…

Methodology · Statistics 2012-02-28 Nicolas Städler , Peter Bühlmann

We provide efficient algorithms for the problem of distribution learning from high-dimensional Gaussian data where in each sample, some of the variable values are missing. We suppose that the variables are missing not at random (MNAR). The…

Machine Learning · Computer Science 2025-04-29 Arnab Bhattacharyya , Constantinos Daskalakis , Themis Gouleakis , Yuhao Wang

In this paper we study covariance estimation with missing data. We consider missing data mechanisms that can be independent of the data, or have a time varying dependency. Additionally, observed variables may have arbitrary (non uniform)…

Statistics Theory · Mathematics 2021-06-17 Eduardo Pavez , Antonio Ortega

Conducting valid statistical analyses is challenging in the presence of missing-not-at-random (MNAR) data, where the missingness mechanism is dependent on the missing values themselves even conditioned on the observed data. Here, we…

Methodology · Statistics 2023-06-13 Anna Guo , Jiwei Zhao , Razieh Nabi

Missing data can lead to inefficiencies and biases in analyses, in particular when data are missing not at random (MNAR). It is thus vital to understand and correctly identify the missing data mechanism. Recovering missing values through a…

Methodology · Statistics 2022-12-08 Jack Noonan , Adetola Adedamola Adediran , Robin Mitra , Stefanie Biedermann

Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these…

Statistics Theory · Mathematics 2015-03-19 Po-Ling Loh , Martin J. Wainwright

To model modern large-scale datasets, we need efficient algorithms to infer a set of $P$ unknown model parameters from $N$ noisy measurements. What are fundamental limits on the accuracy of parameter inference, given finite signal-to-noise…

Machine Learning · Statistics 2016-09-07 Madhu Advani , Surya Ganguli

The missing data issue often complicates the task of estimating generalized linear models (GLMs). We describe why the pseudo-marginal Metropolis-Hastings algorithm, used in this setting, is an effective strategy for parameter estimation.…

Methodology · Statistics 2019-07-23 Taylor R. Brown , Timothy L. McMurry , Alexander Langevin

Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the…

Methodology · Statistics 2016-05-17 T. Tony Cai , Anru Zhang

We consider identification and estimation with an outcome missing not at random (MNAR). We study an identification strategy based on a so-called shadow variable. A shadow variable is assumed to be correlated with the outcome, but…

Methodology · Statistics 2019-09-10 Wang Miao , Lan Liu , Eric Tchetgen Tchetgen , Zhi Geng

Multivariate Gaussian is often used as a first approximation to the distribution of high-dimensional data. Determining the parameters of this distribution under various constraints is a widely studied problem in statistics, and is often…

Statistics Theory · Mathematics 2016-02-09 Samuel Balmand , Arnak Dalalyan
‹ Prev 1 2 3 10 Next ›