Related papers: Tight Lower Bounds for Query Processing on Streami…
We study the randomized version of a computation model (introduced by Grohe, Koch, and Schweikardt (ICALP'05); Grohe and Schweikardt (PODS'05)) that restricts random access to external memory and internal memory space. Essentially, this…
We consider unidirectional data streams with restricted access, such as read-only and write-only streams. For read-write streams, we also introduce a new complexity measure called expansion, the ratio between the space used on the stream…
For many algorithmic problems, traditional algorithms that optimise on the number of instructions executed prove expensive on I/Os. Novel and very different design techniques, when applied to these problems, can produce algorithms that are…
Sorting extremely large datasets is a frequently occuring task in practice. These datasets are usually much larger than the computer's main memory; thus external memory sorting algorithms, first introduced by Aggarwal and Vitter (1988), are…
We study the problem of list ranking in the parallel external memory (PEM) model. We observe an interesting dual nature for the hardness of the problem due to limited information exchange among the processors about the structure of the…
A priority queue is a fundamental data structure that maintains a dynamic ordered set of keys and supports the followig basic operations: insertion of a key, deletion of a key, and finding the smallest key. The complexity of the priority…
The Hierarchical Clustering (HC) problem consists of building a hierarchy of clusters to represent a given dataset. Motivated by the modern large-scale applications, we study the problem in the \streaming model, in which the memory is…
We engineer algorithms for sorting huge data sets on massively parallel machines. The algorithms are based on the multiway merging paradigm. We first outline an algorithm whose I/O requirement is close to a lower bound. Thus, in contrast to…
Algorithms in the data stream model use $O(polylog(N))$ space to compute some property of an input of size $N$, and many of these algorithms are implemented and used in practice. However, sketching algorithms in the graph semi-streaming…
Data stream algorithms tackle operations on high-volume sequences of read-once data items. Data stream scenarios include inherently real-time systems like sensor networks and financial markets. They also arise in purely-computational…
One of the biggest open problems in external memory data structures is the priority queue problem with DecreaseKey operations. If only Insert and ExtractMin operations need to be supported, one can design a comparison-based priority queue…
Many streaming algorithms provide only a high-probability relative approximation. These two relaxations, of allowing approximation and randomization, seem necessary -- for many streaming problems, both relaxations must be employed…
In his breakthrough paper, Raz showed that any parity learning algorithm requires either quadratic memory or an exponential number of samples [FOCS'16, JACM'19]. A line of work that followed extended this result to a large class of learning…
In the Max-Cut problem in the streaming model, an algorithm is given the edges of an unknown graph $G = (V,E)$ in some fixed order, and its goal is to approximate the size of the largest cut in $G$. Improving upon an earlier result of…
Many well-known, real-world problems involve dynamic data which describe the relationship among the entities. Hypergraphs are powerful combinatorial structures that are frequently used to model such data. For many of today's data-centric…
We address the problem of efficient verification of multi-threaded programs running over Total Store Order (TSO) memory model. It has been shown that even with finite data domain programs, the complexity of control state reachability under…
Although many authors have considered how many ternary comparisons it takes to sort a multiset $S$ of size $n$, the best known upper and lower bounds still differ by a term linear in $n$. In this paper we restrict our attention to online…
Clustering is an unsupervised learning technique in which data or objects are grouped into sets based on some similarity measure. Most of the clustering algorithms assume that the main memory is infinite and can accommodate the set of…
The problem of (approximately) counting the number of triangles in a graph is one of the basic problems in graph theory. In this paper we study the problem in the streaming model. We study the amount of memory required by a randomized…
We initiate the study of biological neural networks from the perspective of streaming algorithms. Like computers, human brains suffer from memory limitations which pose a significant obstacle when processing large scale and dynamically…