English
Related papers

Related papers: Streaming and Distributed Algorithms for Robust Co…

200 papers

Work on approximate linear algebra has led to efficient distributed and streaming algorithms for problems such as approximate matrix multiplication, low rank approximation, and regression, primarily for the Euclidean norm $\ell_2$. We study…

Data Structures and Algorithms · Computer Science 2018-07-10 Graham Cormode , Charlie Dickens , David P. Woodruff

Most known algorithms in the streaming model of computation aim to approximate a single function such as an $\ell_p$-norm. In 2009, Nelson [\url{https://sublinear.info}, Open Problem 30] asked if it possible to design \emph{universal…

Data Structures and Algorithms · Computer Science 2020-04-07 Vladimir Braverman , Robert Krauthgamer , Lin F. Yang

Subset selection for the rank $k$ approximation of an $n\times d$ matrix $A$ offers improvements in the interpretability of matrices, as well as a variety of computational savings. This problem is well-understood when the error measure is…

Data Structures and Algorithms · Computer Science 2023-04-20 David P. Woodruff , Taisuke Yasuda

We study $\ell_p$ sampling and frequency moment estimation in a single-pass insertion-only data stream. For $p \in (0,2)$, we present a nearly space-optimal approximate $\ell_p$ sampler that uses $\widetilde{O}(\log n \log(1/\delta))$ bits…

Data Structures and Algorithms · Computer Science 2026-04-07 Honghao Lin , Hoai-An Nguyen , William Swartworth , David P. Woodruff

We consider the problem of selecting the best subset of exactly $k$ columns from an $m \times n$ matrix $A$. We present and analyze a novel two-stage algorithm that runs in $O(\min\{mn^2,m^2n\})$ time and returns as output an $m \times k$…

Data Structures and Algorithms · Computer Science 2015-03-13 Christos Boutsidis , Michael W. Mahoney , Petros Drineas

In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of data stream, our algorithm enjoys the tightest…

Machine Learning · Computer Science 2018-02-21 Moran Feldman , Amin Karbasi , Ehsan Kazemi

We consider the problem of monotone, submodular maximization over a ground set of size $n$ subject to cardinality constraint $k$. For this problem, we introduce the first deterministic algorithms with linear time complexity; these…

Data Structures and Algorithms · Computer Science 2021-03-09 Alan Kuhnle

We study the problem of entrywise $\ell_1$ low rank approximation. We give the first polynomial time column subset selection-based $\ell_1$ low rank approximation algorithm sampling $\tilde{O}(k)$ columns and achieving an…

Data Structures and Algorithms · Computer Science 2020-11-17 Arvind V. Mahankali , David P. Woodruff

We study the low rank approximation problem of any given matrix $A$ over $\mathbb{R}^{n\times m}$ and $\mathbb{C}^{n\times m}$ in entry-wise $\ell_p$ loss, that is, finding a rank-$k$ matrix $X$ such that $\|A-X\|_p$ is minimized. Unlike…

Machine Learning · Computer Science 2019-10-31 Chen Dan , Hong Wang , Hongyang Zhang , Yuchen Zhou , Pradeep Ravikumar

We study streaming algorithms for the $\ell_p$ subspace approximation problem. Given points $a_1, \ldots, a_n$ as an insertion-only stream and a rank parameter $k$, the $\ell_p$ subspace approximation problem is to find a $k$-dimensional…

Data Structures and Algorithms · Computer Science 2024-06-06 Hossein Esfandiari , Vahab Mirrokni , Praneeth Kacham , David P. Woodruff , Peilin Zhong

In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these…

Data Structures and Algorithms · Computer Science 2024-06-12 Rajesh Jayaram , David P. Woodruff , Samson Zhou

In many problems in data mining and machine learning, data items that need to be clustered or classified are not points in a high-dimensional space, but are distributions (points on a high dimensional simplex). For distributions, natural…

Data Structures and Algorithms · Computer Science 2007-07-13 Sudipto Guha , Andrew McGregor , Suresh Venkatasubramanian

The problem of column subset selection has recently attracted a large body of research, with feature selection serving as one obvious and important application. Among the techniques that have been applied to solve this problem, the greedy…

Data Structures and Algorithms · Computer Science 2021-11-16 Jason Altschuler , Aditya Bhaskara , Gang Fu , Vahab Mirrokni , Afshin Rostamizadeh , Morteza Zadimoghaddam

The problem of estimating the pth moment F_p (p nonnegative and real) in data streams is as follows. There is a vector x which starts at 0, and many updates of the form x_i <-- x_i + v come sequentially in a stream. The algorithm also…

Data Structures and Algorithms · Computer Science 2009-04-09 Daniel M. Kane , Jelani Nelson , David P. Woodruff

We study the column subset selection problem with respect to the entrywise $\ell_1$-norm loss. It is known that in the worst case, to obtain a good rank-$k$ approximation to a matrix, one needs an arbitrarily large $n^{\Omega(1)}$ number of…

Data Structures and Algorithms · Computer Science 2020-04-20 Zhao Song , David P. Woodruff , Peilin Zhong

Recent progress in (semi-)streaming algorithms for monotone submodular function maximization has led to tight results for a simple cardinality constraint. However, current techniques fail to give a similar understanding for natural…

Data Structures and Algorithms · Computer Science 2022-02-17 Moran Feldman , Paul Liu , Ashkan Norouzi-Fard , Ola Svensson , Rico Zenklusen

Frequency estimation in data streams is one of the classical problems in streaming algorithms. Following much research, there are now almost matching upper and lower bounds for the trade-off needed between the number of samples and the…

Computational Complexity · Computer Science 2023-01-16 Shachar Lovett , Jiapeng Zhang

Histograms, i.e., piece-wise constant approximations, are a popular tool used to represent data distributions. Traditionally, the difference between the histogram and the underlying distribution (i.e., the approximation error) is measured…

Data Structures and Algorithms · Computer Science 2022-07-19 Justin Y. Chen , Piotr Indyk , Tal Wagner

We consider the problem of maximizing a nonnegative submodular set function $f:2^{\mathcal{N}} \rightarrow \mathbb{R}^+$ subject to a $p$-matchoid constraint in the single-pass streaming setting. Previous work in this context has considered…

Data Structures and Algorithms · Computer Science 2015-05-01 Chandra Chekuri , Shalmoli Gupta , Kent Quanrud

We develop a streaming (one-pass, bounded-memory) word embedding algorithm based on the canonical skip-gram with negative sampling algorithm implemented in word2vec. We compare our streaming algorithm to word2vec empirically by measuring…

Computation and Language · Computer Science 2017-04-26 Chandler May , Kevin Duh , Benjamin Van Durme , Ashwin Lall
‹ Prev 1 2 3 10 Next ›