Related papers: Continuous Matrix Approximation on Distributed Dat…
We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives the rows of a large matrix $A \in \R^{n \times m}$ one after the other in a streaming fashion. It maintains…
Consider the continuous distributed monitoring model in which $n$ distributed nodes, receiving individual data streams, are connected to a designated server. The server is asked to continuously monitor a function defined over the values…
As massive graphs become more prevalent, there is a rapidly growing need for scalable algorithms that solve classical graph problems, such as maximum matching and minimum vertex cover, on large datasets. For massive inputs, several…
There is a growing demand for live, on-the-fly processing of increasingly large amounts of data. In order to ensure the timely and reliable processing of streaming data, a variety of distributed stream processing architectures and platforms…
Emerging applications of machine learning in numerous areas involve continuous gathering of and learning from streams of data. Real-time incorporation of streaming data into the learned models is essential for improved inference in these…
Graph pattern matching involves finding exact or approximate matches for a query subgraph in a larger graph. It has been studied extensively and has strong applications in domains such as computer vision, computational biology, social…
In this thesis, we explore streaming algorithms for approximating constraint satisfaction problems (CSPs). The setup is roughly the following: A computer has limited memory space, sees a long "stream" of local constraints on a set of…
Traditional graph-based semi-supervised learning (SSL) approaches, even though widely applied, are not suited for massive data and large label scenarios since they scale linearly with the number of edges $|E|$ and distinct labels $m$. To…
Fueled by massive data, important decision making is being automated with the help of algorithms, therefore, fairness in algorithms has become an especially important research topic. In this work, we design new streaming and distributed…
We study the classic NP-Hard problem of finding the maximum $k$-set coverage in the data stream model: given a set system of $m$ sets that are subsets of a universe $\{1,\ldots,n \}$, find the $k$ sets that cover the most number of distinct…
We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs…
In this paper, we propose a distributed algorithm for the minimum dominating set problem. For some especial networks, we prove theoretically that the achieved answer by our proposed algorithm is a constant approximation factor of the exact…
We introduce a new computational model for data streams: asymptotically exact streaming algorithms. These algorithms have an approximation ratio that tends to one as the length of the stream goes to infinity while the memory used by the…
Several high-throughput distributed data-processing applications require multi-hop processing of streams of data. These applications include continual processing on data streams originating from a network of sensors, composing a multimedia…
Big data problems frequently require processing datasets in a streaming fashion, either because all data are available at once but collectively are larger than available memory or because the data intrinsically arrive one data point at a…
We study network response to queries that require computation of remotely located data and seek to characterize the performance limits in terms of maximum sustainable query rate that can be satisfied. The available resources include (i) a…
We define the Streaming Communication model that combines the main aspects of communication complexity and streaming. We consider two agents that want to compute some function that depends on inputs that are distributed to each agent. The…
Given a dataset of points in a metric space and an integer $k$, a diversity maximization problem requires determining a subset of $k$ points maximizing some diversity objective measure, e.g., the minimum or the average distance between two…
Triangle counting and sampling are two fundamental problems for streaming algorithms. Arguably, designing sampling algorithms is more challenging than their counting variants. It may be noted that triangle counting has received far greater…
We have a set of processors (or agents) and a set of graph networks defined over some vertex set. Each processor can access a subset of the graph networks. Each processor has a demand specified as a pair of vertices $<u, v>$, along with a…