English
Related papers

Related papers: Skew in Parallel Query Processing

200 papers

We study the problem of computing conjunctive queries over large databases on parallel architectures without shared storage. Using the structure of such a query $q$ and the skew in the data, we study tradeoffs between the number of…

Databases · Computer Science 2016-02-22 Paul Beame , Paraschos Koutris , Dan Suciu

In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. In contrast to previous work, where upper and lower bounds on the…

Databases · Computer Science 2016-04-08 Paul Beame , Paraschos Koutris , Dan Suciu

Handling skew is one of the major challenges in query processing. In distributed computational environments such as MapReduce, uneven distribution of the data to the servers is not desired. One of the dominant measures that we want to…

Databases · Computer Science 2015-04-14 Foto N. Afrati , Jeffrey D. Ullman , Angelos Vasilakopoulos

We consider the problem of computing a relational query $q$ on a large input database of size $n$, using a large number $p$ of servers. The computation is performed in rounds, and each server can receive only $O(n/p^{1-\varepsilon})$ bits…

Databases · Computer Science 2013-06-26 Paul Beame , Paraschos Koutris , Dan Suciu

We study the problem of computing a full Conjunctive Query in parallel using $p$ heterogeneous machines. Our computational model is similar to the MPC model, but each machine has its own cost function mapping from the number of bits it…

Databases · Computer Science 2025-03-12 Simon Frisk , Paraschos Koutris

A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over…

Databases · Computer Science 2015-01-06 Tom J. Ameloot , Gaetano Geck , Bas Ketsman , Frank Neven , Thomas Schwentick

Nowadays, the data to be processed by database systems has grown so large that any conventional, centralized technique is inadequate. At the same time, general purpose computation on GPU (GPGPU) recently has successfully drawn attention…

Databases · Computer Science 2013-09-04 Georgios Koutsoumpakis , Iakovos Koutsoumpakis , Anastasios Gounaris

In this paper, we investigate the problem of computing a multiway join in one round of MapReduce when the data may be skewed. We optimize on communication cost, i.e., the amount of data that is transferred from the mappers to the reducers.…

Databases · Computer Science 2020-01-14 Foto Afrati , Nikos Stasinopoulos , Jeffrey D. Ullman , Angelos Vassilakopoulos

We study the problem of finding and monitoring fixed-size subgraphs in a continually changing large-scale graph. We present the first approach that (i) performs worst-case optimal computation and communication, (ii) maintains a total memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-13 Khaled Ammar , Frank McSherry , Semih Salihoglu , Manas Joglekar

We study statistical problems, such as planted clique, its variants, and sparse principal component analysis in the context of average-case communication complexity. Our motivation is to understand the statistical-computational trade-offs…

Computational Complexity · Computer Science 2021-07-06 Cyrus Rashtchian , David P. Woodruff , Peng Ye , Hanlin Zhu

We consider a model inspired by compatibility constraints that arise between tasks and servers in data centers, cloud computing systems and content delivery networks. The constraints are represented by a bipartite graph or network that…

Probability · Mathematics 2024-04-10 Diego Goldsztajn , Sem C. Borst , Johan S. H. van Leeuwaarden

While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform…

Databases · Computer Science 2020-11-13 Jaclyn Smith , Michael Benedikt , Milos Nikolic , Amir Shaikhha

We study the hardness of Approximate Query Processing (AQP) of various types of queries involving joins over multiple tables of possibly different sizes. In the case where the query result is a single value (e.g., COUNT, SUM, and…

Databases · Computer Science 2020-10-02 Tianyu Liu , Chi Wang

Single-round multiway join algorithms first reshuffle data over many servers and then evaluate the query at hand in a parallel and communication-free way. A key question is whether a given distribution policy for the reshuffle is adequate…

Databases · Computer Science 2015-12-22 Gaetano Geck , Bas Ketsman , Frank Neven , Thomas Schwentick

Finding a maximum clique in a given graph is one of the fundamental NP-hard problems. We compare two multi-core thread-parallel adaptations of a state-of-the-art branch and bound algorithm for the maximum clique problem, and provide a novel…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-09-05 Ciaran McCreesh , Patrick Prosser

Nested relational query languages have been explored extensively, and underlie industrial language-integrated query systems such as Microsoft's LINQ. However, relational databases do not natively support nested collections in query results.…

Databases · Computer Science 2014-05-05 James Cheney , Sam Lindley , Philip Wadler

Recently, MapReduce based spatial query systems have emerged as a cost effective and scalable solution to large scale spatial data processing and analytics. MapReduce based systems achieve massive scalability by partitioning the data and…

Databases · Computer Science 2015-09-04 Ablimit Aji , Vo Hoang , Fusheng Wang

Supercomputers are equipped with an increasingly large number of cores to use computational power as a way of solving problems that are otherwise intractable. Unfortunately, getting serial algorithms to run in parallel to take advantage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-31 Faisal N. Abu-Khzam , Khuzaima Daudjee , Amer E. Mouawad , Naomi Nishimura

Traditional statistical analysis requires that the analysis process and data are independent. By contrast, the new field of adaptive data analysis hopes to understand and provide algorithms and accuracy guarantees for research as it is…

Machine Learning · Computer Science 2017-03-22 Sam Elder

In this paper we study the problem of dynamically maintaining graph properties under batches of edge insertions and deletions in the massively parallel model of computation. In this setting, the graph is stored on a number of machines, each…

Data Structures and Algorithms · Computer Science 2019-08-07 David Durfee , Laxman Dhulipala , Janardhan Kulkarni , Richard Peng , Saurabh Sawlani , Xiaorui Sun
‹ Prev 1 2 3 10 Next ›