English

Sample-Efficient Linear Regression with Self-Selection Bias

Statistics Theory 2024-02-23 v1 Data Structures and Algorithms Machine Learning Statistics Theory

Abstract

We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes mm i.i.d. samples (x,z)=1m(\mathbf{x}_{\ell},z_{\ell})_{\ell=1}^m where z=maxi[k]{xTwi+ηi,}z_{\ell}=\max_{i\in [k]}\{\mathbf{x}_{\ell}^T\mathbf{w}_i+\eta_{i,\ell}\}, but the maximizing index ii_{\ell} is unobserved. Here, the x\mathbf{x}_{\ell} are assumed to be N(0,In)\mathcal{N}(0,I_n) and the noise distribution ηD\mathbf{\eta}_{\ell}\sim \mathcal{D} is centered and independent of x\mathbf{x}_{\ell}. We provide a novel and near optimally sample-efficient (in terms of kk) algorithm to recover w1,,wkRn\mathbf{w}_1,\ldots,\mathbf{w}_k\in \mathbb{R}^n up to additive 2\ell_2-error ε\varepsilon with polynomial sample complexity O~(n)poly(k,1/ε)\tilde{O}(n)\cdot \mathsf{poly}(k,1/\varepsilon) and significantly improved time complexity poly(n,k,1/ε)+O(log(k)/ε)O(k)\mathsf{poly}(n,k,1/\varepsilon)+O(\log(k)/\varepsilon)^{O(k)}. When k=O(1)k=O(1), our algorithm runs in poly(n,1/ε)\mathsf{poly}(n,1/\varepsilon) time, generalizing the polynomial guarantee of an explicit moment matching algorithm of Cherapanamjeri, et al. for k=2k=2 and when it is known that D=N(0,Ik)\mathcal{D}=\mathcal{N}(0,I_k). Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression where the added noise is taken outside the maximum. For this problem, our algorithm is efficient in a much larger range of kk than the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran [IEEE Trans. Inf. Theory 2022] for not too small ε\varepsilon, and leads to improved algorithms for any ε\varepsilon by providing a warm start for existing local convergence methods.

Keywords

Cite

@article{arxiv.2402.14229,
  title  = {Sample-Efficient Linear Regression with Self-Selection Bias},
  author = {Jason Gaitonde and Elchanan Mossel},
  journal= {arXiv preprint arXiv:2402.14229},
  year   = {2024}
}

Comments

40 pages