Related papers: Multi Query Optimization in GLADE

Parallelizing Query Optimization on Shared-Nothing Architectures

Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query…

Databases · Computer Science 2015-11-06 Immanuel Trummer , Christoph Koch

Multi-Resource Parallel Query Scheduling and Optimization

Scheduling query execution plans is a particularly complex problem in shared-nothing parallel systems, where each site consists of a collection of local time-shared (e.g., CPU(s) or disk(s)) and space-shared (e.g., memory) resources and…

Databases · Computer Science 2014-04-01 Minos Garofalakis , Yannis Ioannidis

Scaling Ordered Stream Processing on Shared-Memory Multicores

Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple…

Databases · Computer Science 2018-04-02 Guna Prasaad , G. Ramalingam , Kaushik Rajan

A Comparative Exploration of ML Techniques for Tuning Query Degree of Parallelism

There is a large body of recent work applying machine learning (ML) techniques to query optimization and query performance prediction in relational database management systems (RDBMSs). However, these works typically ignore the effect of…

Databases · Computer Science 2020-05-22 Zhiwei Fan , Rathijit Sen , Paraschos Koutris , Aws Albarghouthi

Scheduling of Graph Queries: Controlling Intra- and Inter-query Parallelism for a High System Throughput

The vast amounts of data used in social, business or traffic networks, biology and other natural sciences are often managed in graph-based data sets, consisting of a few thousand up to billions and trillions of vertices and edges,…

Databases · Computer Science 2021-10-22 Matthias Hauck , Ismail Oukid , Holger Fröning

Diagonal Scaling: A Multi-Dimensional Resource Model and Optimization Framework for Distributed Databases

Modern cloud databases present scaling as a binary decision: scale-out by adding nodes or scale-up by increasing per-node resources. This one-dimensional view is limiting because database performance, cost, and coordination overhead emerge…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-05 Shahir Abdullah , Syed Rohit Zaman

Parallel aggregation is a ubiquitous operation in data analytics that is expressed as GROUP BY in SQL, reduce in Hadoop, or segment in TensorFlow. Parallel aggregation starts with an optional local pre-aggregation step and then repartitions…

Databases · Computer Science 2018-11-30 Feilong Liu , Ario Salmasi , Spyros Blanas , Anastasios Sidiropoulos

Robust Recursive Query Parallelism in Graph Database Management Systems

Efficient multi-core parallel processing of recursive join queries is critical for achieving good performance in graph database management systems (GDBMSs). Prior work adopts two broad approaches. First is the state of the art morsel-driven…

Databases · Computer Science 2025-08-28 Anurag Chakraborty , Semih Salihoğlu

Revisiting Query Performance in GPU Database Systems

GPUs offer massive compute parallelism and high-bandwidth memory accesses. GPU database systems seek to exploit those capabilities to accelerate data analytics. Although modern GPUs have more resources (e.g., higher DRAM bandwidth) than…

Databases · Computer Science 2023-02-03 Jiashen Cao , Rathijit Sen , Matteo Interlandi , Joy Arulraj , Hyesoon Kim

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…

Machine Learning · Computer Science 2025-03-13 Ruifeng She , Bowen Pang , Kai Li , Zehua Liu , Tao Zhong

Flat Parallelization

There are two intertwined factors that affect performance of concurrent data structures: the ability of processes to access the data in parallel and the cost of synchronization. It has been observed that for a large class of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-10 Vitaly Aksenov , Petr Kuznetsov

Cache-based Multi-query Optimization for Data-intensive Scalable Computing Frameworks

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in…

Databases · Computer Science 2018-05-23 Pietro Michiardi , Damiano Carra , Sara Migliorini

Efficient Massively Parallel Join Optimization for Large Queries

Modern data analytical workloads often need to run queries over a large number of tables. An optimal query plan for such queries is crucial for being able to run these queries within acceptable time bounds. However, with queries involving…

Databases · Computer Science 2022-03-02 Riccardo Mancini , Srinivas Karthik , Bikash Chandra , Vasilis Mageirakos , Anastasia Ailamaki

Large-Scale Query and XMatch, Entering the Parallel Zone

Current and future astronomical surveys are producing catalogs with millions and billions of objects. On-line access to such big datasets for data mining and cross-correlation is usually as highly desired as unfeasible. Providing these…

Databases · Computer Science 2007-05-23 Maria A. Nieto-Santisteban , Aniruddha R. Thakar , Alexander S. Szalay , Jim Gray

Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization

Hybrid parallelism techniques are essential for efficiently training large language models (LLMs). Nevertheless, current automatic parallel planning frameworks often overlook the simultaneous consideration of node heterogeneity and dynamic…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-04 Ruilong Wu , Xinjiao Li , Yisu Wang , Xinyu Chen , Dirk Kutscher

Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide

With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference. While existing surveys provide descriptive…

Machine Learning · Computer Science 2026-02-11 Hossam Amer , Rezaul Karim , Ali Pourranjbar , Weiwei Zhang , Walid Ahmed , Boxing Chen

Parallel Scheduling Self-attention Mechanism: Generalization and Optimization

Over the past few years, self-attention is shining in the field of deep learning, especially in the domain of natural language processing(NLP). Its impressive effectiveness, along with ubiquitous implementations, have aroused our interest…

Machine Learning · Computer Science 2020-12-03 Mingfei Yu , Masahiro Fujita

A horizontally-scalable multiprocessing platform based on Node.js

This paper presents a scalable web-based platform called Node Scala which allows to split and handle requests on a parallel distributed system according to pre-defined use cases. We applied this platform to a client application that…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-09-16 Ahmad Maatouki , Marek Szuba , Jörg Meyer , Achim Streit

Stream Processing With Dependency-Guided Synchronization (Extended Version)

Real-time data processing applications with low latency requirements have led to the increasing popularity of stream processing systems. While such systems offer convenient APIs that can be used to achieve data parallelism automatically,…

Programming Languages · Computer Science 2022-01-04 Konstantinos Kallas , Filip Niksic , Caleb Stanford , Rajeev Alur

Data Placement and Replica Selection for Improving Co-location in Distributed Environments

Increasing need for large-scale data analytics in a number of application domains has led to a dramatic rise in the number of distributed data management systems, both parallel relational databases, and systems that support alternative…

Databases · Computer Science 2013-02-19 K. Ashwin Kumar , Amol Deshpande , Samir Khuller