English

Consistent line clustering using geometric hypergraphs

Statistics Theory 2026-04-08 v3 Machine Learning Statistics Theory

Abstract

Subspace clustering becomes inherently difficult near intersections, where points from different subspaces are barely separated. Most existing theoretical results address this issue by imposing separation or sampling assumptions that limit the statistical effect of points near the intersection. We study a minimal setting of two intersecting lines in which the latent sampling law places polynomially large mass in small neighborhoods of the intersection. We derive information-theoretic lower bounds for exact and almost exact recovery under Gaussian noise. In particular, we show that the exact-recovery threshold is determined by the rate at which the latent law concentrates near the intersection. Since any two points are collinear, pairwise information alone does not reveal whether they are sampled from the same latent line. We therefore construct a hypergraph in which nearly collinear triples form hyperedges, and study the resulting hypergraph similarity matrix. Under a simple regularity condition on the latent distribution, we introduce a spectral algorithm that achieves the information-theoretic bounds up to polylogarithmic factors.

Keywords

Cite

@article{arxiv.2505.24868,
  title  = {Consistent line clustering using geometric hypergraphs},
  author = {Kalle Alaluusua and Konstantin Avrachenkov and B. R. Vinay Kumar and Lasse Leskelä},
  journal= {arXiv preprint arXiv:2505.24868},
  year   = {2026}
}

Comments

Major revision: new information-theoretic analysis for latent sampling laws concentrating near the intersection, recovery results for arbitrary fixed angles between the latent lines, revised spectral clustering guarantees, and substantial expository improvements (60 pages, 5 figures, 1 table)

R2 v1 2026-07-01T02:51:16.697Z