English

Deterministic Coreset for Lp Subspace

Data Structures and Algorithms 2026-05-18 v3 Machine Learning

Abstract

We introduce the first iterative algorithm for constructing a ε\varepsilon-coreset that guarantees deterministic p\ell_p subspace embedding for any p[1,)p \in [1,\infty) and any ε>0\varepsilon > 0. For a given full rank matrix XRn×d\mathbf{X} \in \mathbb{R}^{n \times d} where ndn \gg d, XRm×d\mathbf{X}' \in \mathbb{R}^{m \times d} is an (ε,p)(\varepsilon,\ell_p)-subspace embedding of X\mathbf{X}, if for every qRd\mathbf{q} \in \mathbb{R}^d, (1ε)XqppXqpp(1+ε)Xqpp(1-\varepsilon)\|\mathbf{Xq}\|_{p}^{p} \leq \|\mathbf{X'q}\|_{p}^{p} \leq (1+\varepsilon)\|\mathbf{Xq}\|_{p}^{p}. Specifically, in this paper, X\mathbf{X}' is a weighted subset of rows of X\mathbf{X} which is commonly known in the literature as a coreset. In every iteration, the algorithm ensures that the loss on the maintained set is upper and lower bounded by the loss on the original dataset with appropriate scalings. So, unlike typical coreset guarantees, due to bounded loss, our coreset gives a deterministic guarantee for the p\ell_p subspace embedding. For an error parameter ε\varepsilon, our algorithm takes O(poly(n,d,ε1))O(\mathrm{poly}(n,d,\varepsilon^{-1})) time and returns a deterministic ε\varepsilon-coreset, for p\ell_p subspace embedding whose size is O(dmax{1,p/2}ε2)O\left(\frac{d^{\max\{1,p/2\}}}{\varepsilon^{2}}\right). Here, we remove the log\log factors in the coreset size, which had been a long-standing open problem. Our coresets are optimal as they are tight with the lower bound. As an application, our coreset can also be used for approximately solving the p\ell_p regression problem in a deterministic manner.

Cite

@article{arxiv.2601.00361,
  title  = {Deterministic Coreset for Lp Subspace},
  author = {Rachit Chhaya and Anirban Dasgupta and Dan Feldman and Supratim Shit},
  journal= {arXiv preprint arXiv:2601.00361},
  year   = {2026}
}

Comments

The proofs of some claims are incomplete