Related papers: Validating XML Documents in the Streaming Model wi…
We study the problem of solving semidefinite programs (SDP) in the streaming model. Specifically, $m$ constraint matrices and a target matrix $C$, all of size $n\times n$ together with a vector $b\in \mathbb{R}^m$ are streamed to us…
Some of the most relevant document schemas used online, such as XML and JSON, have a nested format. In the last decade, the task of extracting data from nested documents over streams has become especially relevant. We focus on the streaming…
Depth first search (DFS) tree is a fundamental data structure for solving various graph problems. The classical DFS algorithm requires $O(m+n)$ time for a graph having $n$ vertices and $m$ edges. In the streaming model, an algorithm is…
Depth first search is a fundamental graph problem having a wide range of applications. For a graph $G=(V,E)$ having $n$ vertices and $m$ edges, the DFS tree can be computed in $O(m+n)$ using $O(m)$ space where $m=O(n^2)$. In the streaming…
We study the complexity of the following problems in the streaming model. Membership testing for \DLIN We show that every language in \DLIN\ can be recognised by a randomized one-pass $O(\log n)$ space algorithm with inverse polynomial…
Big data problems frequently require processing datasets in a streaming fashion, either because all data are available at once but collectively are larger than available memory or because the data intrinsically arrive one data point at a…
Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional online and offline cases,…
We consider the classic Set Cover problem in the data stream model. For $n$ elements and $m$ sets ($m\geq n$) we give a $O(1/\delta)$-pass algorithm with a strongly sub-linear $\tilde{O}(mn^{\delta})$ space and logarithmic approximation…
We initiate a broad study of classical problems in the streaming model with insertions and deletions in the setting where we allow the approximation factor $\alpha$ to be much larger than $1$. Such algorithms can use significantly less…
Attention computation takes both the time complexity of $O(n^2)$ and the space complexity of $O(n^2)$ simultaneously, which makes deploying Large Language Models (LLMs) in streaming applications that involve long contexts requiring…
We consider the problem of estimating the value of max cut in a graph in the streaming model of computation. At one extreme, there is a trivial $2$-approximation for this problem that uses only $O(\log n)$ space, namely, count the number of…
We consider the problem of monotone, submodular maximization over a ground set of size $n$ subject to cardinality constraint $k$. For this problem, we introduce the first deterministic algorithms with linear time complexity; these…
The distinct elements problem is one of the fundamental problems in streaming algorithms --- given a stream of integers in the range $\{1,\ldots,n\}$, we wish to provide a $(1+\varepsilon)$ approximation to the number of distinct elements…
Finding dense subgraphs is a fundamental algorithmic tool in data mining, community detection, and clustering. In this problem, one aims to find an induced subgraph whose edge-to-vertex ratio is maximized. We study the directed case of this…
We propose two one-pass streaming algorithms for the $\mathcal{NP}$-hard hypergraph matching problem. The first algorithm stores a small subset of potential matching edges in a stack using dual variables to select edges. It has an…
Tracking and approximating data matrices in streaming fashion is a fundamental challenge. The problem requires more care and attention when data comes from multiple distributed sites, each receiving a stream of data. This paper considers…
We study the general problem of computing frequency-based functions, i.e., the sum of any given function of data stream frequencies. Special cases include fundamental data stream problems such as computing the number of distinct elements…
We study the classic NP-Hard problem of finding the maximum $k$-set coverage in the data stream model: given a set system of $m$ sets that are subsets of a universe $\{1,\ldots,n \}$, find the $k$ sets that cover the most number of distinct…
We present a new streaming algorithm to validate JSON documents against a set of constraints given as a JSON schema. Among the possible values a JSON document can hold, objects are unordered collections of key-value pairs while arrays are…
The eXtensible Markup Language (XML) provides a powerful and flexible means of encoding and exchanging data. As it turns out, its main advantage as an encoding format (namely, its requirement that all open and close markup tags are present…