Related papers: Efficient Multi-way Theta-Join Processing Using Ma…

Three-Way Joins on MapReduce: An Experimental Study

We study three-way joins on MapReduce. Joins are very useful in a multitude of applications from data integration and traversing social networks, to mining graphs and automata-based constructions. However, joins are expensive, even for…

Databases · Computer Science 2014-05-19 Ben Kimmett , Alex Thomo , S. Venkatesh

Fast Distributed Complex Join Processing

In this work, we study the problem of co-optimize communication, pre-computing, and computation cost in one-round multi-way join evaluation. We propose a multi-way join approach ADJ (Adaptive Distributed Join) for complex join which finds…

Databases · Computer Science 2021-03-01 Hao Zhang , Miao Qiao , Jeffrey Xu Yu , Hong Cheng

Optimizing Queries with Many-to-Many Joins

As database query processing techniques are being used to handle diverse workloads, a key emerging challenge is how to efficiently handle multi-way join queries containing multiple many-to-many joins. While uncommon in traditional…

Databases · Computer Science 2025-05-20 Hasara Kalumin , Amol Deshpande

Handling Skew in Multiway Joins in Parallel Processing

Handling skew is one of the major challenges in query processing. In distributed computational environments such as MapReduce, uneven distribution of the data to the servers is not desired. One of the dominant measures that we want to…

Databases · Computer Science 2015-04-14 Foto N. Afrati , Jeffrey D. Ullman , Angelos Vasilakopoulos

GPU-based Efficient Join Algorithms on Hadoop

The growing data has brought tremendous pressure for query processing and storage, so there are many studies that focus on using GPU to accelerate join operation, which is one of the most important operations in modern database systems.…

Databases · Computer Science 2019-04-26 Hongzhi Wang , Ning Li , Zheng Wang , Jianing Li

Multi-Agent Join

It is crucial to provide real-time performance in many applications, such as interactive and exploratory data analysis. In these settings, users often need to view subsets of query results quickly. It is challenging to deliver such results…

Databases · Computer Science 2023-12-25 Vahid Ghadakchi , Mian Xie , Arash Termehchy , Bakhtiyar Doskenov , Bharghav Srikhakollu , Summit Haque , Huazheng Wang

Shared Execution of Path Queries on Road Networks

The advancement of mobile technologies and the proliferation of map-based applications have enabled a user to access a wide variety of services that range from information queries to navigation systems. Due to the popularity of map-based…

Databases · Computer Science 2020-04-24 Hossain Mahmud , Ashfaq Mahmood Amin , Mohammed Eunus Ali , Tanzima Hashem

Fast Join Project Query Evaluation using Matrix Multiplication

In the last few years, much effort has been devoted to developing join algorithms in order to achieve worst-case optimality for join queries over relational databases. Towards this end, the database community has had considerable success in…

Databases · Computer Science 2020-03-02 Shaleen Deep , Xiao Hu , Paraschos Koutris

RelJoin: Relative-cost-based Selection of Distributed Join Methods for Query Plan Optimization

Selecting appropriate distributed join methods for logical join operations in a query plan is crucial for the performance of data-intensive scalable computing (DISC). Different network communication patterns in the data exchange phase…

Databases · Computer Science 2023-12-29 F. Liang , F. C. M. Lau , H. Cui , Y. Li , B. Lin , C. Li , X. Hu

Runtime-optimized Multi-way Stream Join Operator for Large-scale Streaming data

Streaming computing enables the real-time processing of large volumes of data and offers significant advantages for various applications, including real-time recommendations, anomaly detection, and monitoring. The multi-way stream join…

Databases · Computer Science 2024-11-26 Jinlong Hu , Tingfeng Qiu

Efficient query evaluation techniques over large amount of distributed linked data

As RDF becomes more widely established and the amount of linked data is rapidly increasing, the efficient querying of large amount of data becomes a significant challenge. In this paper, we propose a family of algorithms for querying large…

Databases · Computer Science 2022-09-13 Eleftherios Kalogeros , Manolis Gergatsoulis , Matthew Damigos , Christos Nomikos

Towards Fast Theta-join: A Prefiltering and Amalgamated Partitioning Approach

As one of the most useful online processing techniques, the theta-join operation has been utilized by many applications to fully excavate the relationships between data streams in various scenarios. As such, constant research efforts have…

Data Structures and Algorithms · Computer Science 2022-08-08 Jiashu Wu , Yang Wang , Xiaopeng Fan , Kejiang Ye , Chengzhong Xu

Meta-MapReduce: A Technique for Reducing Communication in MapReduce Computations

MapReduce has proven to be one of the most useful paradigms in the revolution of distributed computing, where cloud services and cluster computing become the standard venue for computing. The federation of cloud and big data activities is…

Databases · Computer Science 2016-07-29 Foto Afrati , Shlomi Dolev , Shantanu Sharma , Jeffrey D. Ullman

Distributed Subtrajectory Join on Massive Datasets

Joining trajectory datasets is a significant operation in mobility data analytics and the cornerstone of various methods that aim to extract knowledge out of them. In the era of Big Data, the production of mobility data has become massive…

Databases · Computer Science 2020-02-07 Panagiotis Tampakis , Christos Doulkeridis , Nikos Pelekis , Yannis Theodoridis

Effective Spatial Data Partitioning for Scalable Query Processing

Recently, MapReduce based spatial query systems have emerged as a cost effective and scalable solution to large scale spatial data processing and analytics. MapReduce based systems achieve massive scalability by partitioning the data and…

Databases · Computer Science 2015-09-04 Ablimit Aji , Vo Hoang , Fusheng Wang

Parallel Evaluation of Multi-Semi-Joins

While services such as Amazon AWS make computing power abundantly available, adding more computing nodes can incur high costs in, for instance, pay-as-you-go plans while not always significantly improving the net running time (aka…

Databases · Computer Science 2016-05-24 Jonny Daenen , Frank Neven , Tony Tan , Stijn Vansummeren

The Family of MapReduce and Large Scale Data Processing Systems

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

Cache-based Multi-query Optimization for Data-intensive Scalable Computing Frameworks

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in…

Databases · Computer Science 2018-05-23 Pietro Michiardi , Damiano Carra , Sara Migliorini

Processing Database Joins over a Shared-Nothing System of Multicore Machines

To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper looks into the feasibility of scaling up such a shared-nothing system while processing a compute- and…

Databases · Computer Science 2018-04-26 Abhirup Chakraborty

Efficient Massively Parallel Join Optimization for Large Queries

Modern data analytical workloads often need to run queries over a large number of tables. An optimal query plan for such queries is crucial for being able to run these queries within acceptable time bounds. However, with queries involving…

Databases · Computer Science 2022-03-02 Riccardo Mancini , Srinivas Karthik , Bikash Chandra , Vasilis Mageirakos , Anastasia Ailamaki