Related papers: Tight Lower Bounds for Query Processing on Streami…

Randomized Computations on Large Data Sets: Tight Lower Bounds

We study the randomized version of a computation model (introduced by Grohe, Koch, and Schweikardt (ICALP'05); Grohe and Schweikardt (PODS'05)) that restricts random access to external memory and internal memory space. Essentially, this…

Databases · Computer Science 2007-05-23 Martin Grohe , Andre Hernich , Nicole Schweikardt

Unidirectional Input/Output Streaming Complexity of Reversal and Sorting

We consider unidirectional data streams with restricted access, such as read-only and write-only streams. For read-write streams, we also introduce a new complexity measure called expansion, the ratio between the space used on the stream…

Data Structures and Algorithms · Computer Science 2014-07-21 Nathanaël François , Rahul Jain , Frederic Magniez

Efficient Algorithms and Data Structures for Massive Data Sets

For many algorithmic problems, traditional algorithms that optimise on the number of instructions executed prove expensive on I/Os. Novel and very different design techniques, when applied to these problems, can produce algorithms that are…

Data Structures and Algorithms · Computer Science 2010-05-20 Alka

Lower Bounds for External Memory Integer Sorting via Network Coding

Sorting extremely large datasets is a frequently occuring task in practice. These datasets are usually much larger than the computer's main memory; thus external memory sorting algorithms, first introduced by Aggarwal and Vitter (1988), are…

Data Structures and Algorithms · Computer Science 2018-11-06 Alireza Farhadi , MohammadTaghi Hajiaghayi , Kasper Green Larsen , Elaine Shi

On the Complexity of List Ranking in the Parallel External Memory Model

We study the problem of list ranking in the parallel external memory (PEM) model. We observe an interesting dual nature for the hardness of the problem due to limited information exchange among the processors about the structure of the…

Data Structures and Algorithms · Computer Science 2014-09-08 Riko Jacob , Tobias Lieber , Nodari Sitchinava

Equivalence between Priority Queues and Sorting in External Memory

A priority queue is a fundamental data structure that maintains a dynamic ordered set of keys and supports the followig basic operations: insertion of a key, deletion of a key, and finding the smallest key. The complexity of the priority…

Data Structures and Algorithms · Computer Science 2012-07-19 Zhewei Wei , Ke Yi

Hierarchical Clustering in Graph Streams: Single-Pass Algorithms and Space Lower Bounds

The Hierarchical Clustering (HC) problem consists of building a hierarchy of clusters to represent a given dataset. Motivated by the modern large-scale applications, we study the problem in the \streaming model, in which the memory is…

Data Structures and Algorithms · Computer Science 2022-06-16 Sepehr Assadi , Vaggos Chatziafratis , Jakub Łącki , Vahab Mirrokni , Chen Wang

Scalable Distributed-Memory External Sorting

We engineer algorithms for sorting huge data sets on massively parallel machines. The algorithms are based on the multiway merging paradigm. We first outline an algorithm whose I/O requirement is close to a lower bound. Thus, in contrast to…

Data Structures and Algorithms · Computer Science 2009-10-15 Mirko Rahn , Peter Sanders , Johannes Singler

The Case for External Graph Sketching

Algorithms in the data stream model use $O(polylog(N))$ space to compute some property of an input of size $N$, and many of these algorithms are implemented and used in practice. However, sketching algorithms in the graph semi-streaming…

Data Structures and Algorithms · Computer Science 2025-04-25 Michael A. Bender , Martín Farach-Colton , Riko Jacob , Hanna Komlós , David Tench , Evan West

Algorithms for Efficient, Compact Online Data Stream Curation

Data stream algorithms tackle operations on high-volume sequences of read-once data items. Data stream scenarios include inherently real-time systems like sensor networks and financial markets. They also arise in purely-computational…

Data Structures and Algorithms · Computer Science 2024-03-04 Matthew Andres Moreno , Santiago Rodriguez Papa , Emily Dolson

DecreaseKeys are Expensive for External Memory Priority Queues

One of the biggest open problems in external memory data structures is the priority queue problem with DecreaseKey operations. If only Insert and ExtractMin operations need to be supported, one can design a comparison-based priority queue…

Data Structures and Algorithms · Computer Science 2016-11-04 Kasper Eenberg , Kasper Green Larsen , Huacheng Yu

Lower Bounds for Pseudo-Deterministic Counting in a Stream

Many streaming algorithms provide only a high-probability relative approximation. These two relaxations, of allowing approximation and randomization, seem necessary -- for many streaming problems, both relaxations must be employed…

Data Structures and Algorithms · Computer Science 2023-05-16 Vladimir Braverman , Robert Krauthgamer , Aditya Krishnan , Shay Sapir

Tight Time-Space Lower Bounds for Constant-Pass Learning

In his breakthrough paper, Raz showed that any parity learning algorithm requires either quadratic memory or an exponential number of samples [FOCS'16, JACM'19]. A line of work that followed extended this result to a large class of learning…

Machine Learning · Computer Science 2023-10-13 Xin Lyu , Avishay Tal , Hongxun Wu , Junzhao Yang

Multi-Pass Streaming Lower Bounds for Approximating Max-Cut

In the Max-Cut problem in the streaming model, an algorithm is given the edges of an unknown graph $G = (V,E)$ in some fixed order, and its goal is to approximate the size of the largest cut in $G$. Improving upon an earlier result of…

Data Structures and Algorithms · Computer Science 2025-09-08 Yumou Fei , Dor Minzer , Shuo Wang

Streaming Hypergraph Partitioning Algorithms on Limited Memory Environments

Many well-known, real-world problems involve dynamic data which describe the relationship among the entities. Hypergraphs are powerful combinatorial structures that are frequently used to model such data. For many of today's data-centric…

Data Structures and Algorithms · Computer Science 2021-03-10 Fatih Taşyaran , Berkay Demireller , Kamer Kaya , Bora Uçar

Efficient Verification of Concurrent Programs Over TSO Memory Model

We address the problem of efficient verification of multi-threaded programs running over Total Store Order (TSO) memory model. It has been shown that even with finite data domain programs, the complexity of control state reachability under…

Logic in Computer Science · Computer Science 2016-06-20 Chinmay Narayan , Subodh Sharma , S. Arun-Kumar

Tight Bounds for Online Stable Sorting

Although many authors have considered how many ternary comparisons it takes to sort a multiset $S$ of size $n$, the best known upper and lower bounds still differ by a term linear in $n$. In this paper we restrict our attention to online…

Data Structures and Algorithms · Computer Science 2009-07-07 Travis Gagie , Yakov Nekrich

Nearest Neighbor based Clustering Algorithm for Large Data Sets

Clustering is an unsupervised learning technique in which data or objects are grouped into sets based on some similarity measure. Most of the clustering algorithms assume that the main memory is infinite and can accommodate the set of…

Data Structures and Algorithms · Computer Science 2015-05-25 Pankaj Kumar Yadav , Sriniwas Pandey , Sraban Kumar Mohanty

How Hard is Counting Triangles in the Streaming Model

The problem of (approximately) counting the number of triangles in a graph is one of the basic problems in graph theory. In this paper we study the problem in the streaming model. We study the amount of memory required by a randomized…

Data Structures and Algorithms · Computer Science 2013-04-05 Vladimir Braverman , Rafail Ostrovsky , Dan Vilenchik

Spiking Neural Networks Through the Lens of Streaming Algorithms

We initiate the study of biological neural networks from the perspective of streaming algorithms. Like computers, human brains suffer from memory limitations which pose a significant obstacle when processing large scale and dynamically…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-06 Yael Hitron , Cameron Musco , Merav Parter