Sampling Algorithms and Coresets for Lp Regression
Abstract
The Lp regression problem takes as input a matrix , a vector , and a number , and it returns as output a number and a vector such that . In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained () version of this classical problem, for all . The first stage of our algorithm non-uniformly samples rows of and the corresponding elements of , and then it solves the Lp regression problem on the sample; we prove this is an 8-approximation. The second stage of our algorithm uses the output of the first stage to resample constraints, and then it solves the Lp regression problem on the new sample; we prove this is a -approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of Lp regression, namely . In course of proving our result, we develop two concepts--well-conditioned bases and subspace-preserving sampling--that are of independent interest.
Cite
@article{arxiv.0707.1714,
title = {Sampling Algorithms and Coresets for Lp Regression},
author = {Anirban Dasgupta and Petros Drineas and Boulos Harb and Ravi Kumar and Michael W. Mahoney},
journal= {arXiv preprint arXiv:0707.1714},
year = {2007}
}
Comments
19 pages, 1 figure