Thresholded Lasso for high dimensional variable selection

Shuheng Zhou

Thresholded Lasso for high dimensional variable selection

Statistics Theory 2025-10-28 v3 Statistics Theory

Authors: Shuheng Zhou

Abstract

Given $n$ noisy samples with $p$ dimensions, where $n \ll p$ , we show that the multi-step thresholding procedure based on the Lasso -- we call it the {\it Thresholded Lasso}, can accurately estimate a sparse vector $\beta \in {\mathbb R}^p$ in a linear model $Y = X \beta + \epsilon$ , where $X_{n \times p}$ is a design matrix normalized to have column $\ell_2$ -norm $\sqrt{n}$ , and $\epsilon \sim N(0, \sigma^2 I_n)$ . We show that under the restricted eigenvalue (RE) condition, it is possible to achieve the $\ell_2$ loss within a logarithmic factor of the ideal mean square error one would achieve with an $oracle$ while selecting a sufficiently sparse model -- hence achieving $sparse \ oracle \ inequalities$ ; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. We also show for the Gauss-Dantzig selector (Cand\`{e}s-Tao 07), if $X$ obeys a uniform uncertainty principle, one will achieve the sparse oracle inequalities as above, while allowing at most $s_0$ irrelevant variables in the model in the worst case, where $s_0 \leq s$ is the smallest integer such that for $\lambda = \sqrt{2 \log p/n}$ , $\sum_{i=1}^p \min(\beta_i^2, \lambda^2 \sigma^2) \leq s_0 \lambda^2 \sigma^2$ . Our simulation results on the Thresholded Lasso match our theoretical analysis excellently.

Keywords

compressed sensing statistical estimation covariance estimation

Cite

@article{arxiv.2309.15355,
  title  = {Thresholded Lasso for high dimensional variable selection},
  author = {Shuheng Zhou},
  journal= {arXiv preprint arXiv:2309.15355},
  year   = {2025}
}

Comments

arXiv admin note: substantial text overlap with arXiv:1002.1583

Thresholded Lasso for high dimensional variable selection

Abstract

Keywords

Cite

Comments

Related papers