Related papers: Skew in Parallel Query Processing

Communication Cost in Parallel Query Processing

We study the problem of computing conjunctive queries over large databases on parallel architectures without shared storage. Using the structure of such a query $q$ and the skew in the data, we study tradeoffs between the number of…

Databases · Computer Science 2016-02-22 Paul Beame , Paraschos Koutris , Dan Suciu

Worst-Case Optimal Algorithms for Parallel Query Processing

In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. In contrast to previous work, where upper and lower bounds on the…

Databases · Computer Science 2016-04-08 Paul Beame , Paraschos Koutris , Dan Suciu

Handling Skew in Multiway Joins in Parallel Processing

Handling skew is one of the major challenges in query processing. In distributed computational environments such as MapReduce, uneven distribution of the data to the servers is not desired. One of the dominant measures that we want to…

Databases · Computer Science 2015-04-14 Foto N. Afrati , Jeffrey D. Ullman , Angelos Vasilakopoulos

Communication Steps for Parallel Query Processing

We consider the problem of computing a relational query $q$ on a large input database of size $n$, using a large number $p$ of servers. The computation is performed in rounds, and each server can receive only $O(n/p^{1-\varepsilon})$ bits…

Databases · Computer Science 2013-06-26 Paul Beame , Paraschos Koutris , Dan Suciu

Parallel Query Processing with Heterogeneous Machines

We study the problem of computing a full Conjunctive Query in parallel using $p$ heterogeneous machines. Our computational model is similar to the MPC model, but each machine has its own cost function mapping from the number of bits it…

Databases · Computer Science 2025-03-12 Simon Frisk , Paraschos Koutris

Parallel-Correctness and Transferability for Conjunctive Queries

A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over…

Databases · Computer Science 2015-01-06 Tom J. Ameloot , Gaetano Geck , Bas Ketsman , Frank Neven , Thomas Schwentick

Skew Handling in Aggregate Streaming Queries on GPUs

Nowadays, the data to be processed by database systems has grown so large that any conventional, centralized technique is inadequate. At the same time, general purpose computation on GPU (GPGPU) recently has successfully drawn attention…

Databases · Computer Science 2013-09-04 Georgios Koutsoumpakis , Iakovos Koutsoumpakis , Anastasios Gounaris

SharesSkew: An Algorithm to Handle Skew for Joins in MapReduce

In this paper, we investigate the problem of computing a multiway join in one round of MapReduce when the data may be skewed. We optimize on communication cost, i.e., the amount of data that is transferred from the mappers to the reducers.…

Databases · Computer Science 2020-01-14 Foto Afrati , Nikos Stasinopoulos , Jeffrey D. Ullman , Angelos Vassilakopoulos

Distributed Evaluation of Subgraph Queries Using Worstcase Optimal LowMemory Dataflows

We study the problem of finding and monitoring fixed-size subgraphs in a continually changing large-scale graph. We present the first approach that (i) performs worst-case optimal computation and communication, (ii) maintains a total memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-13 Khaled Ammar , Frank McSherry , Semih Salihoglu , Manas Joglekar

Average-Case Communication Complexity of Statistical Problems

We study statistical problems, such as planted clique, its variants, and sparse principal component analysis in the context of average-case communication complexity. Our motivation is to understand the statistical-computational trade-offs…

Computational Complexity · Computer Science 2021-07-06 Cyrus Rashtchian , David P. Woodruff , Peng Ye , Hanlin Zhu

Server saturation in skewed networks

We consider a model inspired by compatibility constraints that arise between tasks and servers in data centers, cloud computing systems and content delivery networks. The constraints are represented by a bipartite graph or network that…

Probability · Mathematics 2024-04-10 Diego Goldsztajn , Sem C. Borst , Johan S. H. van Leeuwaarden

Scalable Querying of Nested Data

While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform…

Databases · Computer Science 2020-11-13 Jaclyn Smith , Michael Benedikt , Milos Nikolic , Amir Shaikhha

Understanding the hardness of approximate query processing with joins

We study the hardness of Approximate Query Processing (AQP) of various types of queries involving joins over multiple tables of possibly different sizes. In the case where the query result is a single value (e.g., COUNT, SUM, and…

Databases · Computer Science 2020-10-02 Tianyu Liu , Chi Wang

Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation

Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate…

Databases · Computer Science 2015-12-22 Gaetano Geck , Bas Ketsman , Frank Neven , Thomas Schwentick

The Shape of the Search Tree for the Maximum Clique Problem, and the Implications for Parallel Branch and Bound

Finding a maximum clique in a given graph is one of the fundamental NP-hard problems. We compare two multi-core thread-parallel adaptations of a state-of-the-art branch and bound algorithm for the maximum clique problem, and provide a novel…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-09-05 Ciaran McCreesh , Patrick Prosser

Query shredding: Efficient relational evaluation of queries over nested multisets (extended version)

Nested relational query languages have been explored extensively, and underlie industrial language-integrated query systems such as Microsoft's LINQ. However, relational databases do not natively support nested collections in query results.…

Databases · Computer Science 2014-05-05 James Cheney , Sam Lindley , Philip Wadler

Effective Spatial Data Partitioning for Scalable Query Processing

Recently, MapReduce based spatial query systems have emerged as a cost effective and scalable solution to large scale spatial data processing and analytics. MapReduce based systems achieve massive scalability by partitioning the data and…

Databases · Computer Science 2015-09-04 Ablimit Aji , Vo Hoang , Fusheng Wang

An Easy-to-use Scalable Framework for Parallel Recursive Backtracking

Supercomputers are equipped with an increasingly large number of cores to use computational power as a way of solving problems that are otherwise intractable. Unfortunately, getting serial algorithms to run in parallel to take advantage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-31 Faisal N. Abu-Khzam , Khuzaima Daudjee , Amer E. Mouawad , Naomi Nishimura

Challenges in Bayesian Adaptive Data Analysis

Traditional statistical analysis requires that the analysis process and data are independent. By contrast, the new field of adaptive data analysis hopes to understand and provide algorithms and accuracy guarantees for research as it is…

Machine Learning · Computer Science 2017-03-22 Sam Elder

Parallel Batch-Dynamic Graphs: Algorithms and Lower Bounds

In this paper we study the problem of dynamically maintaining graph properties under batches of edge insertions and deletions in the massively parallel model of computation. In this setting, the graph is stored on a number of machines, each…

Data Structures and Algorithms · Computer Science 2019-08-07 David Durfee , Laxman Dhulipala , Janardhan Kulkarni , Richard Peng , Saurabh Sawlani , Xiaorui Sun