Related papers: Statistical Inference for High-Dimensional Linear …

Rate Optimal Estimation and Confidence Intervals for High-dimensional Regression with Missing Covariates

Although a majority of the theoretical literature in high-dimensional statistics has focused on settings which involve fully-observed data, settings with missing values and corruptions are common in practice. We consider the problems of…

Machine Learning · Statistics 2017-11-06 Yining Wang , Jialei Wang , Sivaraman Balakrishnan , Aarti Singh

Imputations for High Missing Rate Data in Covariates via Semi-supervised Learning Approach

Advancements in data collection techniques and the heterogeneity of data resources can yield high percentages of missing observations on variables, such as block-wise missing data. Under missing-data scenarios, traditional methods such as…

Methodology · Statistics 2022-05-17 Wei Lan , Xuerong Chen , Tao Zou , Chih-Ling Tsai

Prediction approaches for partly missing multi-omics covariate data: A literature review and an empirical comparison study

As the availability of omics data has increased in the last few years, more multi-omics data have been generated, that is, high-dimensional molecular data consisting of several types such as genomic, transcriptomic, or proteomic data, all…

Genomics · Quantitative Biology 2023-02-09 Roman Hornung , Frederik Ludwigs , Jonas Hagenberg , Anne-Laure Boulesteix

Regression analysis of longitudinal data with mixed synchronous and asynchronous longitudinal covariates

In linear models, omitting a covariate that is orthogonal to covariates in the model does not result in biased coefficient estimation. This in general does not hold for longitudinal data, where additional assumptions are needed to get…

Statistics Theory · Mathematics 2023-05-30 Zhuowei Sun , Hongyuan Cao , Li Chen , Jason P. Fine

A Projection Approach to Local Regression with Variable-Dimension Covariates

Incomplete covariate vectors are known to be problematic for estimation and inferences on model parameters, but their impact on prediction performance is less understood. We develop an imputation-free method that builds on a random…

Methodology · Statistics 2024-05-31 Matthew J. Heiner , Garritt L. Page , Fernando Andrés Quintana

Improved Estimators for Semi-supervised High-dimensional Regression Model

We study a linear high-dimensional regression model in a semi-supervised setting, where for many observations only the vector of covariates $X$ is given with no response $Y$. We do not make any sparsity assumptions on the vector of…

Statistics Theory · Mathematics 2021-09-03 Ilan Livne , David Azriel , Yair Goldberg

Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data

Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the…

Methodology · Statistics 2016-05-17 T. Tony Cai , Anru Zhang

Blockwise Missingness meets AI: A Tractable Solution for Semiparametric Inference

We consider parameter estimation and inference when data feature blockwise, non-monotone missingness. Our approach, rooted in semiparametric theory and inspired by prediction-powered inference, leverages off-the-shelf AI (predictive or…

Methodology · Statistics 2025-09-30 Qi Xu , Lorenzo Testa , Jing Lei , Kathryn Roeder

Semi-supervised linear regression with missing covariates

Missing values in datasets are common in applied statistics. For regression problems, theoretical work thus far has largely considered the issue of missing covariates as distinct from missing responses. However, in practice, many datasets…

Statistics Theory · Mathematics 2026-02-17 Benedict M. Risebrow , Thomas B. Berrett

Doubly Debiased Lasso: High-Dimensional Inference under Hidden Confounding

Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected…

Methodology · Statistics 2021-07-22 Zijian Guo , Domagoj Ćevid , Peter Bühlmann

A zero-estimator approach for estimating the signal level in a high-dimensional model-free setting

We study a high-dimensional regression setting under the assumption of known covariate distribution. We aim at estimating the amount of explained variation in the response by the best linear function of the covariates (the signal level). In…

Statistics Theory · Mathematics 2022-05-12 Ilan Livne , David Azriel , Yair Goldberg

Covariance Matrix Estimation with Non Uniform and Data Dependent Missing Observations

In this paper we study covariance estimation with missing data. We consider missing data mechanisms that can be independent of the data, or have a time varying dependency. Additionally, observed variables may have arbitrary (non uniform)…

Statistics Theory · Mathematics 2021-06-17 Eduardo Pavez , Antonio Ortega

Semiparametric regression and risk prediction with competing risks data under missing cause of failure

The cause of failure in cohort studies that involve competing risks is frequently incompletely observed. To address this, several methods have been proposed for the semiparametric proportional cause-specific hazards model under a missing at…

Methodology · Statistics 2020-02-24 Giorgos Bakoyannis , Ying Zhang , Constantin T. Yiannoutsos

Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates

The standard quantile regression model assumes a linear relationship at the quantile of interest and that all variables are observed. We relax these assumptions by considering a partial linear model while allowing for missing linear…

Methodology · Statistics 2016-06-07 Ben Sherwood

Robust covariance estimation with missing values and cell-wise contamination

Large datasets are often affected by cell-wise outliers in the form of missing or erroneous data. However, discarding any samples containing outliers may result in a dataset that is too small to accurately estimate the covariance matrix.…

Statistics Theory · Mathematics 2023-11-13 Karim Lounici , Grégoire Pacreau

Sparse Linear Regression With Missing Data

This paper proposes a fast and accurate method for sparse regression in the presence of missing data. The underlying statistical model encapsulates the low-dimensional structure of the incomplete data matrix and the sparsity of the…

Machine Learning · Statistics 2015-03-31 Ravi Ganti , Rebecca M. Willett

Weighted empirical likelihood for quantile regression with nonignorable missing covariates

In this paper, we propose an empirical likelihood-based weighted estimator of regression parameter in quantile regression model with nonignorable missing covariates. The proposed estimator is computationally simple and achieves…

Methodology · Statistics 2017-10-10 Xiaohui Yuan , Xiaogang Dong

Efficient Semiparametric Inference for Distributed Data with Blockwise Missingness

We consider statistical inference for a finite-dimensional parameter in a regular semiparametric model under a distributed setting with blockwise missingness, where entire blocks of variables are unavailable at certain sites and sharing…

Methodology · Statistics 2025-08-26 Jingyue Huang , Huiyuan Wang , Yuqing Lei , Yong Chen

Debiased regression adjustment in completely randomized experiments with moderately high-dimensional covariates

Completely randomized experiment is the gold standard for causal inference. When the covariate information for each experimental candidate is available, one typical way is to include them in covariate adjustments for more accurate treatment…

Methodology · Statistics 2025-06-10 Xin Lu , Fan Yang , Yuhao Wang

Variational Bayesian Multiple Imputation in High-Dimensional Regression Models With Missing Responses

Multiple imputation has become one of the standard methods in drawing inferences in many incomplete data applications. Applications of multiple imputation in relatively more complex settings, such as high-dimensional clustered data, require…

Methodology · Statistics 2025-04-08 Qiushuang Li , Recai Yucel