English

Sparse Regression via Range Counting

Data Structures and Algorithms 2020-01-01 v2 Computational Geometry

Abstract

The sparse regression problem, also known as best subset selection problem, can be cast as follows: Given a set SS of nn points in Rd\mathbb{R}^d, a point yRdy\in \mathbb{R}^d, and an integer 2kd2 \leq k \leq d, find an affine combination of at most kk points of SS that is nearest to yy. We describe a O(nk1logdk+2n)O(n^{k-1} \log^{d-k+2} n)-time randomized (1+ε)(1+\varepsilon)-approximation algorithm for this problem with dd and ε\varepsilon constant. This is the first algorithm for this problem running in time o(nk)o(n^k). Its running time is similar to the query time of a data structure recently proposed by Har-Peled, Indyk, and Mahabadi (ICALP'18), while not requiring any preprocessing. Up to polylogarithmic factors, it matches a conditional lower bound relying on a conjecture about affine degeneracy testing. In the special case where k=d=O(1)k = d = O(1), we also provide a simple Oδ(nd1+δ)O_\delta(n^{d-1+\delta})-time deterministic exact algorithm, for any δ>0\delta > 0. Finally, we show how to adapt the approximation algorithm for the sparse linear regression and sparse convex regression problems with the same running time, up to polylogarithmic factors.

Keywords

Cite

@article{arxiv.1908.00351,
  title  = {Sparse Regression via Range Counting},
  author = {Jean Cardinal and Aurélien Ooms},
  journal= {arXiv preprint arXiv:1908.00351},
  year   = {2020}
}