Related papers: Imputation for High-Dimensional Linear Regression
Advancements in data collection techniques and the heterogeneity of data resources can yield high percentages of missing observations on variables, such as block-wise missing data. Under missing-data scenarios, traditional methods such as…
This paper is concerned with inference on the regression function of a high-dimensional linear model when outcomes are missing at random. We propose an estimator which combines a Lasso pilot estimate of the regression function with a bias…
Although a majority of the theoretical literature in high-dimensional statistics has focused on settings which involve fully-observed data, settings with missing values and corruptions are common in practice. We consider the problems of…
We study regression discontinuity designs in which many predetermined covariates, possibly much more than the number of observations, can be used to increase the precision of treatment effect estimates. We consider a two-step estimator…
In this paper we recast the problem of missing values in the covariates of a regression model as a latent Gaussian Markov random field (GMRF) model in a fully Bayesian framework. Our proposed approach is based on the definition of the…
Sparse regression such as the Lasso has achieved great success in handling high-dimensional data. However, one of the biggest practical problems is that high-dimensional data often contain large amounts of missing values. Convex Conditioned…
This paper studies the inference of the regression coefficient matrix under multivariate response linear regressions in the presence of hidden variables. A novel procedure for constructing confidence intervals of entries of the coefficient…
This paper studies inference in the high-dimensional linear regression model with outliers. Sparsity constraints are imposed on the vector of coefficients of the covariates. The number of outliers can grow with the sample size while their…
We propose a residual randomization procedure designed for robust Lasso-based inference in the high-dimensional setting. Compared to earlier work that focuses on sub-Gaussian errors, the proposed procedure is designed to work robustly in…
Missing data are frequently encountered in high-dimensional problems, but they are usually difficult to deal with using standard algorithms, such as the expectation-maximization (EM) algorithm and its variants. To tackle this difficulty,…
For statistical inference on regression models with a diverging number of covariates, the existing literature typically makes sparsity assumptions on the inverse of the Fisher information matrix. Such assumptions, however, are often…
Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for…
This research deals with the estimation and imputation of missing data in longitudinal models with a Poisson response variable inflated with zeros. A methodology is proposed that is based on the use of maximum likelihood, assuming that data…
Missing data theory deals with the statistical methods in the occurrence of missing data. Missing data occurs when some values are not stored or observed for variables of interest. However, most of the statistical theory assumes that data…
A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under controlled circumstances. Now, linear regression models are commonly used to analyze observational…
This paper develops a new framework, called modular regression, to utilize auxiliary information -- such as variables other than the original features or additional data sets -- in the training process of linear models. At a high level, our…
Regression models with both high-dimensional responses and covariates have attracted growing attention. Standard multivariate regression models become inadequate when the response variables depend not only on observed covariates but also on…
Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected…
Missing data arises when certain values are not recorded or observed for variables of interest. However, most of the statistical theory assume complete data availability. To address incomplete databases, one approach is to fill the gaps…
Beta regression is commonly employed when the outcome variable is a proportion. Since its conception, the approach has been widely used in applications spanning various scientific fields. A series of extensions have been proposed over time,…