Related papers: A Near-Optimal Algorithm for L1-Difference
We study sketching and streaming algorithms for the Longest Common Subsequence problem (LCS) on strings of small alphabet size $|\Sigma|$. For the problem of deciding whether the LCS of strings $x,y$ has length at least $L$, we obtain a…
This paper resolves one of the longest standing basic problems in the streaming computational model. Namely, optimal construction of quantile sketches. An $\varepsilon$ approximate quantile sketch receives a stream of items $x_1,\ldots,x_n$…
We resolve the space complexity of linear sketches for approximating the maximum matching problem in dynamic graph streams where the stream may include both edge insertion and deletion. Specifically, we show that for any $\epsilon > 0$,…
We initiate a broad study of classical problems in the streaming model with insertions and deletions in the setting where we allow the approximation factor $\alpha$ to be much larger than $1$. Such algorithms can use significantly less…
We consider statistical as well as algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. For a LS problem with input data $(X, Y) \in \mathbb{R}^{n \times p} \times \mathbb{R}^n$,…
Maximum coverage and minimum set cover problems --collectively called coverage problems-- have been studied extensively in streaming models. However, previous research not only achieve sub-optimal approximation factors and space…
We consider the problem of computing a $(1+\epsilon)$-approximation of the Hamming distance between a pattern of length $n$ and successive substrings of a stream. We first look at the one-way randomised communication complexity of this…
Edit distance is an important measure of string similarity. It counts the number of insertions, deletions and substitutions one has to make to a string $x$ to get a string $y$. In this paper we design an almost linear-size sketching scheme…
The metric sketching problem is defined as follows. Given a metric on $n$ points, and $\epsilon>0$, we wish to produce a small size data structure (sketch) that, given any pair of point indices, recovers the distance between the points up…
We study the problem of solving semidefinite programs (SDP) in the streaming model. Specifically, $m$ constraint matrices and a target matrix $C$, all of size $n\times n$ together with a vector $b\in \mathbb{R}^m$ are streamed to us…
The majority of streaming problems are defined and analyzed in a static setting, where the data stream is any worst-case sequence of insertions and deletions that is fixed in advance. However, many real-world applications require a more…
We present faster algorithms for approximate maximum flow in undirected graphs with good separator structures, such as bounded genus, minor free, and geometric graphs. Given such a graph with $n$ vertices, $m$ edges along with a recursive…
We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives the rows of a large matrix $A \in \R^{n \times m}$ one after the other in a streaming fashion. It maintains…
We present the first sublinear memory sketch that can be queried to find the nearest neighbors in a dataset. Our online sketching algorithm compresses an N element dataset to a sketch of size $O(N^b \log^3 N)$ in $O(N^{(b+1)} \log^3 N)$…
Estimating ranks, quantiles, and distributions over streaming data is a central task in data analysis and monitoring. Given a stream of $n$ items from a data universe equipped with a total order, the task is to compute a sketch (data…
We improve upon previous oblivious sketching and turnstile streaming results for $\ell_1$ and logistic regression, giving a much smaller sketching dimension achieving $O(1)$-approximation and yielding an efficient optimization problem in…
We develop an algorithm for estimating the values of a vector x in R^n over a support S of size k from a randomized sparse binary linear sketch Ax of size O(k). Given Ax and S, we can recover x' with ||x' - x_S||_2 <= eps ||x - x_S||_2 with…
Sketching is one of the most fundamental tools in large-scale machine learning. It enables runtime and memory saving via randomly compressing the original large problem into lower dimensions. In this paper, we propose a novel sketching…
The problem of estimating frequency moments of a data stream has attracted a lot of attention since the onset of streaming algorithms [AMS99]. While the space complexity for approximately computing the $p^{\rm th}$ moment, for $p\in(0,2]$…
LP-type problems such as the Minimum Enclosing Ball (MEB), Linear Support Vector Machine (SVM), Linear Programming (LP), and Semidefinite Programming (SDP) are fundamental combinatorial optimization problems, with many important…