Related papers: Statistical Inference for High-Dimensional Linear …
Although a majority of the theoretical literature in high-dimensional statistics has focused on settings which involve fully-observed data, settings with missing values and corruptions are common in practice. We consider the problems of…
Advancements in data collection techniques and the heterogeneity of data resources can yield high percentages of missing observations on variables, such as block-wise missing data. Under missing-data scenarios, traditional methods such as…
As the availability of omics data has increased in the last few years, more multi-omics data have been generated, that is, high-dimensional molecular data consisting of several types such as genomic, transcriptomic, or proteomic data, all…
In linear models, omitting a covariate that is orthogonal to covariates in the model does not result in biased coefficient estimation. This in general does not hold for longitudinal data, where additional assumptions are needed to get…
Incomplete covariate vectors are known to be problematic for estimation and inferences on model parameters, but their impact on prediction performance is less understood. We develop an imputation-free method that builds on a random…
We study a linear high-dimensional regression model in a semi-supervised setting, where for many observations only the vector of covariates $X$ is given with no response $Y$. We do not make any sparsity assumptions on the vector of…
Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the…
We consider parameter estimation and inference when data feature blockwise, non-monotone missingness. Our approach, rooted in semiparametric theory and inspired by prediction-powered inference, leverages off-the-shelf AI (predictive or…
Missing values in datasets are common in applied statistics. For regression problems, theoretical work thus far has largely considered the issue of missing covariates as distinct from missing responses. However, in practice, many datasets…
Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected…
We study a high-dimensional regression setting under the assumption of known covariate distribution. We aim at estimating the amount of explained variation in the response by the best linear function of the covariates (the signal level). In…
In this paper we study covariance estimation with missing data. We consider missing data mechanisms that can be independent of the data, or have a time varying dependency. Additionally, observed variables may have arbitrary (non uniform)…
The cause of failure in cohort studies that involve competing risks is frequently incompletely observed. To address this, several methods have been proposed for the semiparametric proportional cause-specific hazards model under a missing at…
The standard quantile regression model assumes a linear relationship at the quantile of interest and that all variables are observed. We relax these assumptions by considering a partial linear model while allowing for missing linear…
Large datasets are often affected by cell-wise outliers in the form of missing or erroneous data. However, discarding any samples containing outliers may result in a dataset that is too small to accurately estimate the covariance matrix.…
This paper proposes a fast and accurate method for sparse regression in the presence of missing data. The underlying statistical model encapsulates the low-dimensional structure of the incomplete data matrix and the sparsity of the…
In this paper, we propose an empirical likelihood-based weighted estimator of regression parameter in quantile regression model with nonignorable missing covariates. The proposed estimator is computationally simple and achieves…
We consider statistical inference for a finite-dimensional parameter in a regular semiparametric model under a distributed setting with blockwise missingness, where entire blocks of variables are unavailable at certain sites and sharing…
Completely randomized experiment is the gold standard for causal inference. When the covariate information for each experimental candidate is available, one typical way is to include them in covariate adjustments for more accurate treatment…
Multiple imputation has become one of the standard methods in drawing inferences in many incomplete data applications. Applications of multiple imputation in relatively more complex settings, such as high-dimensional clustered data, require…