Related papers: Streaming and Distributed Algorithms for Robust Co…

Leveraging Well-Conditioned Bases: Streaming \& Distributed Summaries in Minkowski $p$-Norms

Work on approximate linear algebra has led to efficient distributed and streaming algorithms for problems such as approximate matrix multiplication, low rank approximation, and regression, primarily for the Euclidean norm $\ell_2$. We study…

Data Structures and Algorithms · Computer Science 2018-07-10 Graham Cormode , Charlie Dickens , David P. Woodruff

Universal Streaming of Subset Norms

Most known algorithms in the streaming model of computation aim to approximate a single function such as an $\ell_p$-norm. In 2009, Nelson [\url{https://sublinear.info}, Open Problem 30] asked if it possible to design \emph{universal…

Data Structures and Algorithms · Computer Science 2020-04-07 Vladimir Braverman , Robert Krauthgamer , Lin F. Yang

New Subset Selection Algorithms for Low Rank Approximation: Offline and Online

Subset selection for the rank $k$ approximation of an $n\times d$ matrix $A$ offers improvements in the interpretability of matrices, as well as a variety of computational savings. This problem is well-understood when the error measure is…

Data Structures and Algorithms · Computer Science 2023-04-20 David P. Woodruff , Taisuke Yasuda

Unbiased Insights: Optimal Streaming Algorithms for $\ell_p$ Sampling, the Forget Model, and Beyond

We study $\ell_p$ sampling and frequency moment estimation in a single-pass insertion-only data stream. For $p \in (0,2)$, we present a nearly space-optimal approximate $\ell_p$ sampler that uses $\widetilde{O}(\log n \log(1/\delta))$ bits…

Data Structures and Algorithms · Computer Science 2026-04-07 Honghao Lin , Hoai-An Nguyen , William Swartworth , David P. Woodruff

An Improved Approximation Algorithm for the Column Subset Selection Problem

We consider the problem of selecting the best subset of exactly $k$ columns from an $m \times n$ matrix $A$. We present and analyze a novel two-stage algorithm that runs in $O(\min\{mn^2,m^2n\})$ time and returns as output an $m \times k$…

Data Structures and Algorithms · Computer Science 2015-03-13 Christos Boutsidis , Michael W. Mahoney , Petros Drineas

Do Less, Get More: Streaming Submodular Maximization with Subsampling

In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of data stream, our algorithm enjoys the tightest…

Machine Learning · Computer Science 2018-02-21 Moran Feldman , Amin Karbasi , Ehsan Kazemi

Quick Streaming Algorithms for Maximization of Monotone Submodular Functions in Linear Time

We consider the problem of monotone, submodular maximization over a ground set of size $n$ subject to cardinality constraint $k$. For this problem, we introduce the first deterministic algorithms with linear time complexity; these…

Data Structures and Algorithms · Computer Science 2021-03-09 Alan Kuhnle

Optimal $\ell_1$ Column Subset Selection and a Fast PTAS for Low Rank Approximation

We study the problem of entrywise $\ell_1$ low rank approximation. We give the first polynomial time column subset selection-based $\ell_1$ low rank approximation algorithm sampling $\tilde{O}(k)$ columns and achieving an…

Data Structures and Algorithms · Computer Science 2020-11-17 Arvind V. Mahankali , David P. Woodruff

Optimal Analysis of Subset-Selection Based L_p Low Rank Approximation

We study the low rank approximation problem of any given matrix $A$ over $\mathbb{R}^{n\times m}$ and $\mathbb{C}^{n\times m}$ in entry-wise $\ell_p$ loss, that is, finding a rank-$k$ matrix $X$ such that $\|A-X\|_p$ is minimized. Unlike…

Machine Learning · Computer Science 2019-10-31 Chen Dan , Hong Wang , Hongyang Zhang , Yuchen Zhou , Pradeep Ravikumar

High-Dimensional Geometric Streaming for Nearly Low Rank Data

We study streaming algorithms for the $\ell_p$ subspace approximation problem. Given points $a_1, \ldots, a_n$ as an insertion-only stream and a rank parameter $k$, the $\ell_p$ subspace approximation problem is to find a $k$-dimensional…

Data Structures and Algorithms · Computer Science 2024-06-06 Hossein Esfandiari , Vahab Mirrokni , Praneeth Kacham , David P. Woodruff , Peilin Zhong

Streaming Algorithms with Few State Changes

In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these…

Data Structures and Algorithms · Computer Science 2024-06-12 Rajesh Jayaram , David P. Woodruff , Samson Zhou

Streaming and Sublinear Approximation of Entropy and Information Distances

In many problems in data mining and machine learning, data items that need to be clustered or classified are not points in a high-dimensional space, but are distributions (points on a high dimensional simplex). For distributions, natural…

Data Structures and Algorithms · Computer Science 2007-07-13 Sudipto Guha , Andrew McGregor , Suresh Venkatasubramanian

Greedy Column Subset Selection: New Bounds and Distributed Algorithms

The problem of column subset selection has recently attracted a large body of research, with feature selection serving as one obvious and important application. Among the techniques that have been applied to solve this problem, the greedy…

Data Structures and Algorithms · Computer Science 2021-11-16 Jason Altschuler , Aditya Bhaskara , Gang Fu , Vahab Mirrokni , Afshin Rostamizadeh , Morteza Zadimoghaddam

Revisiting Norm Estimation in Data Streams

The problem of estimating the pth moment F_p (p nonnegative and real) in data streams is as follows. There is a vector x which starts at 0, and many updates of the form x_i <-- x_i + v come sequentially in a stream. The algorithm also…

Data Structures and Algorithms · Computer Science 2009-04-09 Daniel M. Kane , Jelani Nelson , David P. Woodruff

Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss

We study the column subset selection problem with respect to the entrywise $\ell_1$-norm loss. It is known that in the worst case, to obtain a good rank-$k$ approximation to a matrix, one needs an arbitrarily large $n^{\Omega(1)}$ number of…

Data Structures and Algorithms · Computer Science 2020-04-20 Zhao Song , David P. Woodruff , Peilin Zhong

Streaming Submodular Maximization under Matroid Constraints

Recent progress in (semi-)streaming algorithms for monotone submodular function maximization has led to tight results for a simple cardinality constraint. However, current techniques fail to give a similar understanding for natural…

Data Structures and Algorithms · Computer Science 2022-02-17 Moran Feldman , Paul Liu , Ashkan Norouzi-Fard , Ola Svensson , Rico Zenklusen

Streaming Lower Bounds and Asymmetric Set-Disjointness

Frequency estimation in data streams is one of the classical problems in streaming algorithms. Following much research, there are now almost matching upper and lower bounds for the trade-off needed between the number of samples and the…

Computational Complexity · Computer Science 2023-01-16 Shachar Lovett , Jiapeng Zhang

Streaming Algorithms for Support-Aware Histograms

Histograms, i.e., piece-wise constant approximations, are a popular tool used to represent data distributions. Traditionally, the difference between the histogram and the underlying distribution (i.e., the approximation error) is measured…

Data Structures and Algorithms · Computer Science 2022-07-19 Justin Y. Chen , Piotr Indyk , Tal Wagner

Streaming Algorithms for Submodular Function Maximization

We consider the problem of maximizing a nonnegative submodular set function $f:2^{\mathcal{N}} \rightarrow \mathbb{R}^+$ subject to a $p$-matchoid constraint in the single-pass streaming setting. Previous work in this context has considered…

Data Structures and Algorithms · Computer Science 2015-05-01 Chandra Chekuri , Shalmoli Gupta , Kent Quanrud

Streaming Word Embeddings with the Space-Saving Algorithm

We develop a streaming (one-pass, bounded-memory) word embedding algorithm based on the canonical skip-gram with negative sampling algorithm implemented in word2vec. We compare our streaming algorithm to word2vec empirically by measuring…

Computation and Language · Computer Science 2017-04-26 Chandler May , Kevin Duh , Benjamin Van Durme , Ashwin Lall