Related papers: An Explicit Sampling Dependent Spectral Error Boun…

Fair Column Subset Selection

The problem of column subset selection asks for a subset of columns from an input matrix such that the matrix can be reconstructed as accurately as possible within the span of the selected columns. A natural extension is to consider a…

Machine Learning · Computer Science 2024-08-13 Antonis Matakos , Bruno Ordozgoiti , Suhas Thejaswi

Using a Non-Commutative Bernstein Bound to Approximate Some Matrix Algorithms in the Spectral Norm

We focus on \emph{row sampling} based approximations for matrix algorithms, in particular matrix multipication, sparse matrix reconstruction, and \math{\ell_2} regression. For \math{\matA\in\R^{m\times d}} (\math{m} points in \math{d\ll m}…

Data Structures and Algorithms · Computer Science 2011-03-29 Malik Magdon-Ismail

Optimality and computational barriers in variable selection under dependence

We study the optimal sample complexity of variable selection in linear regression under general design covariance, and show that subset selection is optimal while under standard complexity assumptions, efficient algorithms for this problem…

Statistics Theory · Mathematics 2025-10-07 Ming Gao , Bryon Aragam

Uniform Sampling for Matrix Approximation

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time…

Data Structures and Algorithms · Computer Science 2014-08-22 Michael B. Cohen , Yin Tat Lee , Cameron Musco , Christopher Musco , Richard Peng , Aaron Sidford

Column Selection via Adaptive Sampling

Selecting a good column (or row) subset of massive data matrices has found many applications in data analysis and machine learning. We propose a new adaptive sampling algorithm that can be used to improve any relative-error column selection…

Data Structures and Algorithms · Computer Science 2015-10-15 Saurabh Paul , Malik Magdon-Ismail , Petros Drineas

A Statistical Perspective on Algorithmic Leveraging

One popular method for dealing with large-scale data sets is sampling. For example, by using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales…

Methodology · Statistics 2013-06-25 Ping Ma , Michael W. Mahoney , Bin Yu

On Sampling Random Features From Empirical Leverage Scores: Implementation and Theoretical Guarantees

Random features provide a practical framework for large-scale kernel approximation and supervised learning. It has been shown that data-dependent sampling of random features using leverage scores can significantly reduce the number of…

Machine Learning · Computer Science 2019-03-21 Shahin Shahrampour , Soheil Kolouri

Subset selection for matrices in spectral norm

We address the subset selection problem for matrices, where the goal is to select a subset of $k$ columns from a "short-and-fat" matrix $X \in \mathbb{R}^{m \times n}$, such that the pseudoinverse of the sampled submatrix has as small…

Numerical Analysis · Mathematics 2025-07-29 Ivan Kozyrev , Alexander Osinsky

A Provably Accurate Randomized Sampling Algorithm for Logistic Regression

In statistics and machine learning, logistic regression is a widely-used supervised learning technique primarily employed for binary classification tasks. When the number of observations greatly exceeds the number of predictor variables, we…

Machine Learning · Statistics 2024-04-02 Agniva Chowdhury , Pradeep Ramuhalli

Subset Selection for Multiple Linear Regression via Optimization

Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…

Machine Learning · Statistics 2020-09-04 Young Woong Park , Diego Klabjan

Provable Deterministic Leverage Score Sampling

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain…

Data Structures and Algorithms · Computer Science 2014-06-04 Dimitris Papailiopoulos , Anastasios Kyrillidis , Christos Boutsidis

Faster Subset Selection for Matrices and Applications

We study subset selection for matrices defined as follows: given a matrix $\matX \in \R^{n \times m}$ ($m > n$) and an oversampling parameter $k$ ($n \le k \le m$), select a subset of $k$ columns from $\matX$ such that the pseudo-inverse of…

Data Structures and Algorithms · Computer Science 2013-06-25 Haim Avron , Christos Boutsidis

Greedy Column Subset Selection: New Bounds and Distributed Algorithms

The problem of column subset selection has recently attracted a large body of research, with feature selection serving as one obvious and important application. Among the techniques that have been applied to solve this problem, the greedy…

Data Structures and Algorithms · Computer Science 2021-11-16 Jason Altschuler , Aditya Bhaskara , Gang Fu , Vahab Mirrokni , Afshin Rostamizadeh , Morteza Zadimoghaddam

Adaptive randomized pivoting for column subset selection, DEIM, and low-rank approximation

We derive a new adaptive leverage score sampling strategy for solving the Column Subset Selection Problem (CSSP). The resulting algorithm, called Adaptive Randomized Pivoting, can be viewed as a randomization of Osinsky's recently proposed…

Numerical Analysis · Mathematics 2025-06-23 Alice Cortinovis , Daniel Kressner

Improving prediction accuracy by choosing resampling distribution via cross-validation

In a regression model, prediction is typically performed after model selection. The large variability in the model selection makes the prediction unstable. Thus, it is essential to reduce the variability in model selection and improve…

Computation · Statistics 2024-04-11 Wataru Yoshida , Kei Hirose

On the selection of optimal subdata for big data regression based on leverage scores

The demand of computational resources for the modeling process increases as the scale of the datasets does, since traditional approaches for regression involve inverting huge data matrices. The main problem relies on the large data size,…

Methodology · Statistics 2023-07-06 Vasilis Chasiotis , Dimitris Karlis

A Universal Sampling Method for Reconstructing Signals with Simple Fourier Transforms

Reconstructing continuous signals from a small number of discrete samples is a fundamental problem across science and engineering. In practice, we are often interested in signals with 'simple' Fourier structure, such as bandlimited,…

Data Structures and Algorithms · Computer Science 2018-12-24 Haim Avron , Michael Kapralov , Cameron Musco , Christopher Musco , Ameya Velingker , Amir Zandieh

On Fast Leverage Score Sampling and Optimal Learning

Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores…

Machine Learning · Statistics 2019-01-25 Alessandro Rudi , Daniele Calandriello , Luigi Carratino , Lorenzo Rosasco

Optimal subsampling for large scale Elastic-net regression

Datasets with sheer volume have been generated from fields including computer vision, medical imageology, and astronomy whose large-scale and high-dimensional properties hamper the implementation of classical statistical models. To tackle…

Statistics Theory · Mathematics 2023-05-30 Hang Yu , Zhenxing Dou , Zhiwei Chen , Xiaomeng Yan

Optimal Sampling Gaps for Adaptive Submodular Maximization

Running machine learning algorithms on large and rapidly growing volumes of data is often computationally expensive, one common trick to reduce the size of a data set, and thus reduce the computational cost of machine learning algorithms,…

Machine Learning · Computer Science 2022-01-25 Shaojie Tang , Jing Yuan