English

Efficient QR-based Column Subset Selection through Randomized Sparse Embeddings

Numerical Analysis 2025-09-05 v2 Numerical Analysis

Abstract

In this paper, we introduce an efficient algorithm for column subset selection that combines the column-pivoted QR factorization with sparse subspace embeddings. The proposed method, SE-QRSC, is particularly effective for wide matrices with significantly more columns than rows. Starting from a matrix AA, the algorithm selects kk columns from the sketched matrix B=AΩTB = A \Omega^T, where Ω\Omega is a sparse subspace embedding of range(AT)\mathrm{range}(A^T). The sparsity structure of Ω\Omega is then exploited to map the selected pivots back to the corresponding columns of AA, which are then used to produce the final subset of selected columns. We prove that this procedure yields a factorization with strong rank-revealing properties, thus revealing the spectrum of AA. The resulting bounds exhibit a reduced dependence on the number of columns of AA compared to those obtained from the strong rank-revealing QR factorization of AA. Moreover, when the leverage scores are known, such as for orthogonal matrices, or can be efficiently approximated, the bounds become entirely independent of the column dimension. For general matrices, the algorithm can be extended by first applying an additional subspace embedding of range(A)range(A).

Keywords

Cite

@article{arxiv.2509.03198,
  title  = {Efficient QR-based Column Subset Selection through Randomized Sparse Embeddings},
  author = {Israa Fakih and Laura Grigori},
  journal= {arXiv preprint arXiv:2509.03198},
  year   = {2025}
}