English

Provable Deterministic Leverage Score Sampling

Data Structures and Algorithms 2014-06-04 v3 Information Theory Numerical Analysis math.IT Statistics Theory Machine Learning Statistics Theory

Abstract

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain provable guarantees, previous work requires randomized sampling of the columns with probabilities proportional to their leverage scores. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such deterministic sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.

Keywords

Cite

@article{arxiv.1404.1530,
  title  = {Provable Deterministic Leverage Score Sampling},
  author = {Dimitris Papailiopoulos and Anastasios Kyrillidis and Christos Boutsidis},
  journal= {arXiv preprint arXiv:1404.1530},
  year   = {2014}
}

Comments

20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

R2 v1 2026-06-22T03:43:54.228Z