Related papers: Lower Bounds for Multi-Pass Processing of Multiple…
We study the problem of minimizing total completion time on parallel machines subject to varying processing capacity. In this paper, we develop an approximation scheme for the problem under the data stream model where the input data is…
Modern data analytic workloads increasingly require handling multiple data models simultaneously. Two primary approaches meet this need: polyglot persistence and multi-model database systems. Polyglot persistence employs a coordinator…
The seminal work of Ahn, Guha, and McGregor in 2012 introduced the graph sketching technique and used it to present the first streaming algorithms for various graph problems over dynamic streams with both insertions and deletions of edges.…
We introduce the poly-streaming model, a generalization of streaming models of computation in which $k$ processors process $k$ data streams containing a total of $N$ items. The algorithm is allowed $O\left(f(k)\cdot M_1\right)$ space, where…
One of the most popular approaches to multi-target tracking is tracking-by-detection. Current min-cost flow algorithms which solve the data association problem optimally have three main drawbacks: they are computationally expensive, they…
In the semi-streaming model for processing massive graphs, an algorithm makes multiple passes over the edges of a given $n$-vertex graph and is tasked with computing the solution to a problem using $O(n \cdot \text{polylog}(n))$ space.…
We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS,…
Data stream processing systems (DSPSs) enable users to express and run stream applications to continuously process data streams. To achieve real-time data analytics, recent researches keep focusing on optimizing the system latency and…
We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs…
Through legislation and technical advances users gain more control over how their data is processed, and they expect online services to respect their privacy choices and preferences. However, data may be processed for many different…
Motivated by recent progress on symmetry breaking problems such as maximal independent set (MIS) and maximal matching in the low-memory Massively Parallel Computation (MPC) model (e.g., Behnezhad et al.~PODC 2019; Ghaffari-Uitto SODA 2019),…
The problem of missing data, usually absent incurated and competition-standard datasets, is an unfortunate reality for most machine learning models used in industry applications. Recent work has focused on understanding the nature and the…
We present new lower bounds that show that a polynomial number of passes are necessary for solving some fundamental graph problems in the streaming model of computation. For instance, we show that any streaming algorithm that finds a…
Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data. Real-time incorporation of streaming data into the learned models is essential for improved inference in these…
In recent years there has been a growing interest in developing "streaming algorithms" for efficient processing and querying of continuous data streams. These algorithms seek to provide accurate results while minimizing the required storage…
The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software…
The problem of (approximately) counting the number of triangles in a graph is one of the basic problems in graph theory. In this paper we study the problem in the streaming model. We study the amount of memory required by a randomized…
With the explosion of the size of digital dataset, the limiting factor for decomposition algorithms is the \emph{number of passes} over the input, as the input is often stored out-of-core or even off-site. Moreover, we're only interested in…
We introduce a new notion of information complexity for multi-pass streaming problems and use it to resolve several important questions in data streams. In the coin problem, one sees a stream of $n$ i.i.d. uniform bits and one would like to…
Memory disaggregation addresses memory imbalance in a cluster by decoupling CPU and memory allocations of applications while also increasing the effective memory capacity for (memory-intensive) applications beyond the local memory limit…