Related papers: Distributed Semi-Supervised Sparse Statistical Inf…
Semi-supervised learning has attracted significant attention due to the proliferation of applications featuring limited labeled data but abundant unlabeled data. In this paper, we examine the statistical inference problem in an…
Although various distributed machine learning schemes have been proposed recently for pure linear models and fully nonparametric models, little attention has been paid on distributed optimization for semi-paramemetric models with…
In semi-supervised learning, the prevailing understanding suggests that observing additional unlabeled samples improves estimation accuracy for linear parameters only in the case of model misspecification. In this work, we challenge such a…
High-dimensional regression models with regularized sparse estimation are widely applied. For statistical inferences, debiased methods are available about single coefficients or predictions with sparse new covariate vectors (also called…
This paper considers statistical inference for the explained variance $\beta^{\intercal}\Sigma \beta$ under the high-dimensional linear model $Y=X\beta+\epsilon$ in the semi-supervised setting, where $\beta$ is the regression vector and…
Inference in semi-supervised (SS) settings has gained substantial attention in recent years due to increased relevance in modern big-data problems. In a typical SS setting, there is a much larger-sized unlabeled data, containing only…
We consider statistical inference under a semi-supervised setting where we have access to both a labeled dataset consisting of pairs $\{X_i, Y_i \}_{i=1}^n$ and an unlabeled dataset $\{ X_i \}_{i=n+1}^{n+N}$. We ask the question: under what…
Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected…
We consider random sample splitting for estimation and inference in high dimensional generalized linear models, where we first apply the lasso to select a submodel using one subsample and then apply the debiased lasso to fit the selected…
Statistical inferences for high-dimensional regression models have been extensively studied for their wide applications ranging from genomics, neuroscience, to economics. However, in practice, there are often potential unmeasured…
Completely randomized experiment is the gold standard for causal inference. When the covariate information for each experimental candidate is available, one typical way is to include them in covariate adjustments for more accurate treatment…
Gaussian graphical regressions have emerged as a powerful approach for regressing the precision matrix of a Gaussian graphical model on covariates, which, unlike traditional Gaussian graphical models, can help determine how graphs are…
In high-dimensional statistical inference in which the number of parameters to be estimated is larger than that of the holding data, regularized linear estimation techniques are widely used. These techniques have, however, some drawbacks.…
We devise a one-shot approach to distributed sparse regression in the high-dimensional setting. The key idea is to average "debiased" or "desparsified" lasso estimators. We show the approach converges at the same rate as the lasso as long…
We propose methodology for statistical inference for low-dimensional parameters of sparse precision matrices in a high-dimensional setting. Our method leads to a non-sparse estimator of the precision matrix whose entries have a Gaussian…
The development of modern technology has enabled data collection of unprecedented size, which poses new challenges to many statistical estimation and inference problems. This paper studies the maximum score estimator of a semi-parametric…
Debiasing group graphical lasso estimates enables statistical inference when multiple Gaussian graphical models share a common sparsity pattern. We analyze the estimation properties of group graphical lasso, establishing convergence rates…
We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines. Computationally, at each round our method only requires the master machine to solve a…
We consider statistical inference for a single coordinate of regression coefficients in high-dimensional linear models. Recently, the debiased estimators are popularly used for constructing confidence intervals and hypothesis testing in…
The premise of semi-supervised learning (SSL) is that combining labeled and unlabeled data yields significantly more accurate models. Despite empirical successes, the theoretical understanding of SSL is still far from complete. In this…