Related papers: Fast Distributed Complex Join Processing
As database query processing techniques are being used to handle diverse workloads, a key emerging challenge is how to efficiently handle multi-way join queries containing multiple many-to-many joins. While uncommon in traditional…
Selecting appropriate distributed join methods for logical join operations in a query plan is crucial for the performance of data-intensive scalable computing (DISC). Different network communication patterns in the data exchange phase…
We study three-way joins on MapReduce. Joins are very useful in a multitude of applications from data integration and traversing social networks, to mining graphs and automata-based constructions. However, joins are expensive, even for…
Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries…
It is crucial to provide real-time performance in many applications, such as interactive and exploratory data analysis. In these settings, users often need to view subsets of query results quickly. It is challenging to deliver such results…
The Join operator, as one of the most expensive and commonly used operators in database systems, plays a substantial role in Database Management System (DBMS) performance. Among the many different Join algorithms studied over the last…
This paper investigates distributed resource allocation optimization over directed graphs with limited communication bandwidth. We develop a novel distributed algorithm that integrates the centralized Proximal Jacobian Alternating Direction…
In distributed optimization and federated learning, asynchronous alternating direction method of multipliers (ADMM) serves as an attractive option for large-scale optimization, data privacy, straggler nodes and variety of objective…
Joining trajectory datasets is a significant operation in mobility data analytics and the cornerstone of various methods that aim to extract knowledge out of them. In the era of Big Data, the production of mobility data has become massive…
In this paper, we propose a novel distributed alternating direction method of multipliers (ADMM) algorithm with synergetic communication and computation, called SCCD-ADMM, to reduce the total communication and computation cost of the…
Handling skew is one of the major challenges in query processing. In distributed computational environments such as MapReduce, uneven distribution of the data to the servers is not desired. One of the dominant measures that we want to…
We address distributed learning problems over undirected networks. Specifically, we focus on designing a novel ADMM-based algorithm that is jointly computation- and communication-efficient. Our design guarantees computational efficiency by…
Streaming computing enables the real-time processing of large volumes of data and offers significant advantages for various applications, including real-time recommendations, anomaly detection, and monitoring. The multi-way stream join…
In the last few years, much effort has been devoted to developing join algorithms in order to achieve worst-case optimality for join queries over relational databases. Towards this end, the database community has had considerable success in…
We propose a distributed algorithm, named Distributed Alternating Direction Method of Multipliers (D-ADMM), for solving separable optimization problems in networks of interconnected nodes or agents. In a separable optimization problem there…
Large Language Models (LLMs) are being increasingly used within data systems to process large datasets with text fields. A broad class of such tasks involves a semantic join-joining two tables based on a natural language predicate per pair…
Join processing is a fundamental operation in database management systems; however, traditional join algorithms often encounter efficiency challenges when dealing with complex queries that produce intermediate results much larger than the…
Methods for distributed optimization have received significant attention in recent years owing to their wide applicability in various domains. A distributed optimization method typically consists of two key components: communication and…
Machine Learning has proven useful in the recent years as a way to achieve failure prediction for industrial systems. However, the high computational resources necessary to run learning algorithms are an obstacle to its widespread…
Multi-agent distributed consensus optimization problems arise in many signal processing applications. Recently, the alternating direction method of multipliers (ADMM) has been used for solving this family of problems. ADMM based distributed…