English

Sampling Algorithms and Coresets for Lp Regression

Data Structures and Algorithms 2007-07-13 v1

Abstract

The Lp regression problem takes as input a matrix A\Realn×dA \in \Real^{n \times d}, a vector b\Realnb \in \Real^n, and a number p[1,)p \in [1,\infty), and it returns as output a number Z{\cal Z} and a vector xopt\Realdx_{opt} \in \Real^d such that Z=minx\RealdAxbp=Axoptbp{\cal Z} = \min_{x \in \Real^d} ||Ax -b||_p = ||Ax_{opt}-b||_p. In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained (ndn \gg d) version of this classical problem, for all p[1,)p \in [1, \infty). The first stage of our algorithm non-uniformly samples r^1=O(36pdmax{p/2+1,p}+1)\hat{r}_1 = O(36^p d^{\max\{p/2+1, p\}+1}) rows of AA and the corresponding elements of bb, and then it solves the Lp regression problem on the sample; we prove this is an 8-approximation. The second stage of our algorithm uses the output of the first stage to resample r^1/ϵ2\hat{r}_1/\epsilon^2 constraints, and then it solves the Lp regression problem on the new sample; we prove this is a (1+ϵ)(1+\epsilon)-approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of Lp regression, namely p=1,2p = 1,2. In course of proving our result, we develop two concepts--well-conditioned bases and subspace-preserving sampling--that are of independent interest.

Keywords

Cite

@article{arxiv.0707.1714,
  title  = {Sampling Algorithms and Coresets for Lp Regression},
  author = {Anirban Dasgupta and Petros Drineas and Boulos Harb and Ravi Kumar and Michael W. Mahoney},
  journal= {arXiv preprint arXiv:0707.1714},
  year   = {2007}
}

Comments

19 pages, 1 figure

R2 v1 2026-06-21T08:57:25.770Z