Related papers: Parallelizing Query Optimization on Shared-Nothing…
Scheduling query execution plans is a particularly complex problem in shared-nothing parallel systems, where each site consists of a collection of local time-shared (e.g., CPU(s) or disk(s)) and space-shared (e.g., memory) resources and…
SQL-on-Hadoop systems, query optimization, data distribution over multiple nodes and parallelization techniques are few of the areas under extreme research these days. Big names like Amazon, Google, Microsoft and many more are working on…
As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…
Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple…
In this paper, we present several improvements in the parallelization of the in-place merge algorithm, which merges two contiguous sorted arrays into one with an O(T) space complexity (where T is the number of threads). The approach divides…
We analyze parallel algorithms in the context of exhaustive search over totally ordered sets. Imagine an infinite list of "boxes", with a "treasure" hidden in one of them, where the boxes' order reflects the importance of finding the…
Traditionally, query optimizers have been designed for computer systems that share a common architecture, consisting of a CPU, main memory and disk subsystem. The efficiency of query optimizers and their successful employment relied on the…
Sorting is one of the most fundamental problems in the field of computer science. With the rapid development of manycore processors, it shows great importance to design efficient parallel sort algorithm on manycore architecture. This paper…
Optimizing the parallel training of large models requires exploring intra-operator parallelism plans for a computation graph that typically contains tens of thousands of primitive operators. While the optimization of parallel data…
Entity matching is an important and difficult step for integrating web data. To reduce the typically high execution time for matching we investigate how we can perform entity matching in parallel on a distributed infrastructure. We propose…
Parallel batched data structures are designed to process synchronized batches of operations in a parallel computing model. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel…
The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation in processing graphs. Recently, size, variety, and structural complexity of these networks has grown dramatically.…
We consider a parallel system of $m$ identical machines prone to unpredictable crashes and restarts, trying to cope with the continuous arrival of tasks to be executed. Tasks have different computational requirements (i.e., processing time…
In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. In contrast to previous work, where upper and lower bounds on the…
The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For…
The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window…
This paper studies parallelization schemes for stochastic Vector Quantization algorithms in order to obtain time speed-ups using distributed resources. We show that the most intuitive parallelization scheme does not lead to better…
Topic modeling is a very powerful technique in data analysis and data mining but it is generally slow. Many parallelization approaches have been proposed to speed up the learning process. However, they are usually not very efficient because…
The vast amounts of data used in social, business or traffic networks, biology and other natural sciences are often managed in graph-based data sets, consisting of a few thousand up to billions and trillions of vertices and edges,…