Related papers: Efficient volume sampling for row/column subset se…

Polynomial Time Algorithms for Dual Volume Sampling

We study dual volume sampling, a method for selecting k columns from an n x m short and wide matrix (n <= k <= m) such that the probability of selection is proportional to the volume spanned by the rows of the induced submatrix. This method…

Machine Learning · Statistics 2017-11-17 Chengtao Li , Stefanie Jegelka , Suvrit Sra

Reverse iterative volume sampling for linear regression

We study the following basic machine learning task: Given a fixed set of $d$-dimensional input points for a linear regression problem, we wish to predict a hidden response value for each of the points. We can only afford to attain the…

Machine Learning · Computer Science 2018-06-07 Michał Dereziński , Manfred K. Warmuth

Online Row Sampling

Finding a small spectral approximation for a tall $n \times d$ matrix $A$ is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of $A$. Row sampling improves…

Data Structures and Algorithms · Computer Science 2016-04-20 Michael B. Cohen , Cameron Musco , Jakub Pachocki

Leveraged volume sampling for linear regression

Suppose an $n \times d$ design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to sample only a small number $k \ll n$ of the responses, and then produce a…

Machine Learning · Computer Science 2018-09-06 Michał Dereziński , Manfred K. Warmuth , Daniel Hsu

Approximate volume and integration for basic semi-algebraic sets

Given a basic compact semi-algebraic set $\K\subset\R^n$, we introduce a methodology that generates a sequence converging to the volume of $\K$. This sequence is obtained from optimal values of a hierarchy of either semidefinite or linear…

Optimization and Control · Mathematics 2015-05-13 Didier Henrion , Jean Bernard Lasserre , Carlo Savorgnan

Faster Subset Selection for Matrices and Applications

We study subset selection for matrices defined as follows: given a matrix $\matX \in \R^{n \times m}$ ($m > n$) and an oversampling parameter $k$ ($n \le k \le m$), select a subset of $k$ columns from $\matX$ such that the pseudo-inverse of…

Data Structures and Algorithms · Computer Science 2013-06-25 Haim Avron , Christos Boutsidis

Non-Adaptive Adaptive Sampling on Turnstile Streams

Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation. Adaptive sampling repeatedly…

Data Structures and Algorithms · Computer Science 2020-04-24 Sepideh Mahabadi , Ilya Razenshteyn , David P. Woodruff , Samson Zhou

Subset Sampling and Its Extensions

This paper studies the \emph{subset sampling} problem. The input is a set $\mathcal{S}$ of $n$ records together with a function $\textbf{p}$ that assigns each record $v\in\mathcal{S}$ a probability $\textbf{p}(v)$. A query returns a random…

Data Structures and Algorithms · Computer Science 2023-07-24 Jinchao Huang , Sibo Wang

Efficient QR-based Column Subset Selection through Randomized Sparse Embeddings

In this paper, we introduce an efficient algorithm for column subset selection that combines the column-pivoted QR factorization with sparse subspace embeddings. The proposed method, SE-QRSC, is particularly effective for wide matrices with…

Numerical Analysis · Mathematics 2025-09-05 Israa Fakih , Laura Grigori

Near-Optimal Averaging Samplers and Matrix Samplers

We present the first efficient averaging sampler that achieves asymptotically optimal randomness complexity and near-optimal sample complexity. For any $\delta < \varepsilon$ and any constant $\alpha > 0$, our sampler uses $m + O(\log (1 /…

Computational Complexity · Computer Science 2025-08-18 Zhiyang Xun , David Zuckerman

Provably Correct Algorithms for Matrix Column Subset Selection with Selectively Sampled Data

We consider the problem of matrix column subset selection, which selects a subset of columns from an input matrix such that the input can be well approximated by the span of the selected columns. Column subset selection has been applied to…

Machine Learning · Statistics 2018-01-26 Yining Wang , Aarti Singh

Fair and Representative Subset Selection from Data Streams

We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be…

Data Structures and Algorithms · Computer Science 2021-02-15 Yanhao Wang , Francesco Fabbri , Michael Mathioudakis

Sampling an Edge in Sublinear Time Exactly and Optimally

Sampling edges from a graph in sublinear time is a fundamental problem and a powerful subroutine for designing sublinear-time algorithms. Suppose we have access to the vertices of the graph and know a constant-factor approximation to the…

Data Structures and Algorithms · Computer Science 2022-11-15 Talya Eden , Shyam Narayanan , Jakub Tětek

Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss

We study the column subset selection problem with respect to the entrywise $\ell_1$-norm loss. It is known that in the worst case, to obtain a good rank-$k$ approximation to a matrix, one needs an arbitrarily large $n^{\Omega(1)}$ number of…

Data Structures and Algorithms · Computer Science 2020-04-20 Zhao Song , David P. Woodruff , Peilin Zhong

Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design

We study the optimal design problems where the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector in $d$ dimensions. We study the $A$-optimal design variant where the objective is to…

Data Structures and Algorithms · Computer Science 2018-07-18 Aleksandar Nikolov , Mohit Singh , Uthaipon Tao Tantipongpipat

On Subspace Approximation and Subset Selection in Fewer Passes by MCMC Sampling

We consider the problem of subset selection for $\ell_{p}$ subspace approximation, i.e., given $n$ points in $d$ dimensions, we need to pick a small, representative subset of the given points such that its span gives $(1+\epsilon)$…

Computational Geometry · Computer Science 2021-03-23 Amit Deshpande , Rameshwar Pratap

Efficient Uniform Sampling of Surjections via their Profiles

In this article, we develop efficient sampling algorithms for random surjections from $[n]$ to $[k]$ for all $n \geq k$. We make no assumption about $n$ and $k$. In particular, we do not make the common assumption that the ratio…

Data Structures and Algorithms · Computer Science 2026-05-26 Arnaud Carayol , Pablo Rotondo

Low rank approximation of positive semi-definite symmetric matrices using Gaussian elimination and volume sampling

Positive semi-definite matrices commonly occur as normal matrices of least squares problems in statistics or as kernel matrices in machine learning and approximation theory. They are typically large and dense. Thus algorithms to solve…

Numerical Analysis · Mathematics 2020-12-01 Markus Hegland , Frank deHoog

An Efficient Streaming Algorithm for Approximating Graphlet Distributions

In recent years, the problem of computing the frequencies of the induced $k$-vertex subgraphs of a graph, or \emph{$k$-graphlets}, has become central. One approach for this problem is to sample $k$-graphlets randomly. Classic algorithms for…

Data Structures and Algorithms · Computer Science 2026-04-29 Marco Bressan , T-H. Hubert Chan , Qipeng Kuang , Mauro Sozio

The Power of Uniform Sampling for $k$-Median

We study the power of uniform sampling for $k$-Median in various metric spaces. We relate the query complexity for approximating $k$-Median, to a key parameter of the dataset, called the balancedness $\beta \in (0, 1]$ (with $1$ being…

Data Structures and Algorithms · Computer Science 2023-02-23 Lingxiao Huang , Shaofeng H. -C. Jiang , Jianing Lou