Related papers: Column Selection via Adaptive Sampling
We consider the problem of matrix column subset selection, which selects a subset of columns from an input matrix such that the input can be well approximated by the span of the selected columns. Column subset selection has been applied to…
Adaptive randomized pivoting (ARP) is a recently proposed and highly effective algorithm for column subset selection. This paper reinterprets the ARP algorithm by drawing connections to the volume sampling distribution and active learning…
The era of huge data necessitates highly efficient machine learning algorithms. Many common machine learning algorithms, however, rely on computationally intensive subroutines that are prohibitively expensive on large datasets. Oftentimes,…
We consider the related tasks of matrix completion and matrix approximation from missing data and propose adaptive sampling procedures for both problems. We show that adaptive sampling allows one to eliminate standard incoherence…
Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation. Adaptive sampling repeatedly…
This paper examines the problem of locating outlier columns in a large, otherwise low-rank, matrix. We propose a simple two-step adaptive sensing and inference approach and establish theoretical guarantees for its performance; our results…
We propose a new randomized optimization method for high-dimensional problems which can be seen as a generalization of coordinate descent to random subspaces. We show that an adaptive sampling strategy for the random subspace significantly…
Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time…
We derive a new adaptive leverage score sampling strategy for solving the Column Subset Selection Problem (CSSP). The resulting algorithm, called Adaptive Randomized Pivoting, can be viewed as a randomization of Osinsky's recently proposed…
We study the problem of exact completion for $m \times n$ sized matrix of rank $r$ with the adaptive sampling method. We introduce a relation of the exact completion problem with the sparsest vector of column and row spaces (which we call…
The problem of column subset selection has recently attracted a large body of research, with feature selection serving as one obvious and important application. Among the techniques that have been applied to solve this problem, the greedy…
We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns. Learned factors may be sparse or dense and/or non-negative, which makes our algorithm suitable for dictionary learning,…
In the big data era researchers face a series of problems. Even standard approaches/methodologies, like linear regression, can be difficult or problematic with huge volumes of data. Traditional approaches for regression in big datasets may…
The problem of column subset selection asks for a subset of columns from an input matrix such that the matrix can be reconstructed as accurately as possible within the span of the selected columns. A natural extension is to consider a…
This paper addresses matrix approximation problems for matrices that are large, sparse and/or that are representations of large graphs. To tackle these problems, we consider algorithms that are based primarily on coarsening techniques,…
Learning interpretable models has become a major focus of machine learning research, given the increasing prominence of machine learning in socially important decision-making. Among interpretable models, rule lists are among the best-known…
Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming…
We propose a randomized method for solving linear programs with a large number of columns but a relatively small number of constraints. Since enumerating all the columns is usually unrealistic, such linear programs are commonly solved by…
The number of non-negative integer matrices with given row and column sums appears in a variety of problems in mathematics and statistics but no closed-form expression for it is known, so we rely on approximations of various kinds. Here we…
Subsampling algorithms are a natural approach to reduce data size before fitting models on massive datasets. In recent years, several works have proposed methods for subsampling rows from a data matrix while maintaining relevant information…