Related papers: High-Speed Query Processing over High-Speed Networ…

Processing Database Joins over a Shared-Nothing System of Multicore Machines

To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper looks into the feasibility of scaling up such a shared-nothing system while processing a compute- and…

Databases · Computer Science 2018-04-26 Abhirup Chakraborty

Fast OLAP Query Execution in Main Memory on Large Data in a Cluster

Main memory column-stores have proven to be efficient for processing analytical queries. Still, there has been much less work in the context of clusters. Using only a single machine poses several restrictions: Processing power and data…

Databases · Computer Science 2017-09-18 Demian Hespe , Martin Weidner , Jonathan Dees , Peter Sanders

Scalable Hash Table for NUMA Systems

Hash tables are used in a plethora of applications, including database operations, DNA sequencing, string searching, and many more. As such, there are many parallelized hash tables targeting multicore, distributed, and accelerator-based…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-05 Alok Tripathy , Oded Green

Terabyte-Scale Analytics in the Blink of an Eye

For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of a paradigm shift. The scaling laws and…

Databases · Computer Science 2025-08-05 Bowen Wu , Wei Cui , Carlo Curino , Matteo Interlandi , Rathijit Sen

Fast Query Processing by Distributing an Index over CPU Caches

Data intensive applications on clusters often require requests quickly be sent to the node managing the desired data. In many applications, one must look through a sorted tree structure to determine the responsible node for accessing or…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Xiaoqin Ma , Gene Cooperman

Multi-Resource Parallel Query Scheduling and Optimization

Scheduling query execution plans is a particularly complex problem in shared-nothing parallel systems, where each site consists of a collection of local time-shared (e.g., CPU(s) or disk(s)) and space-shared (e.g., memory) resources and…

Databases · Computer Science 2014-04-01 Minos Garofalakis , Yannis Ioannidis

Queue Management in Network Processors

One of the main bottlenecks when designing a network processing system is very often its memory subsystem. This is mainly due to the state-of-the-art network links operating at very high speeds and to the fact that in order to support…

Hardware Architecture · Computer Science 2011-11-09 I. Papaefstathiou , T. Orphanoudakis , G. Kornaros , C. Kachris , I. Mavroidis , A. Nikologiannis

Evaluation of a Simple, Scalable, Parallel Best-First Search Strategy

Large-scale, parallel clusters composed of commodity processors are increasingly available, enabling the use of vast processing capabilities and distributed RAM to solve hard search problems. We investigate Hash-Distributed A* (HDA*), a…

Artificial Intelligence · Computer Science 2015-03-20 Akihiro Kishimoto , Alex Fukunaga , Adi Botea

Scaling Ordered Stream Processing on Shared-Memory Multicores

Many modern applications require real-time processing of large volumes of high-speed data. Such data processing needs can be modeled as a streaming computation. A streaming computation is specified as a dataflow graph that exposes multiple…

Databases · Computer Science 2018-04-02 Guna Prasaad , G. Ramalingam , Kaushik Rajan

Diagonal Scaling: A Multi-Dimensional Resource Model and Optimization Framework for Distributed Databases

Modern cloud databases present scaling as a binary decision: scale-out by adding nodes or scale-up by increasing per-node resources. This one-dimensional view is limiting because database performance, cost, and coordination overhead emerge…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-05 Shahir Abdullah , Syed Rohit Zaman

Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication

Next-generation wireless technologies (for immersive-massive communication, joint communication and sensing) demand highly parallel architectures for massive data processing. A common architectural template scales up by grouping tens to…

Hardware Architecture · Computer Science 2025-07-08 Samuel Riedel , Yichao Zhang , Marco Bertuletti , Luca Benini

Cache-based Multi-query Optimization for Data-intensive Scalable Computing Frameworks

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in…

Databases · Computer Science 2018-05-23 Pietro Michiardi , Damiano Carra , Sara Migliorini

Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks

Training and deploying deep learning models in real-world applications require processing large amounts of data. This is a challenging task when the amount of data grows to a hundred terabytes, or even, petabyte-scale. We introduce a hybrid…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-17 Davit Buniatyan

Hybrid Parallel Bidirectional Sieve based on SMP Cluster

In this article, hybrid parallel bidirectional sieve method is implemented by SMP Cluster, the individual computational units joined together by the communication network, are usually shared-memory systems with one or more multicore…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-05-23 Gang Liao , Lian Luo , Lei Liu

Fast Data Management with Distributed Streaming SQL

To stay competitive in today's data driven economy, enterprises large and small are turning to stream processing platforms to process high volume, high velocity, and diverse streams of data (fast data) as they arrive. Low-level programming…

Databases · Computer Science 2015-11-13 Milinda Pathirage , Beth Plale

Parallelizing Query Optimization on Shared-Nothing Architectures

Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query…

Databases · Computer Science 2015-11-06 Immanuel Trummer , Christoph Koch

Theseus: A Distributed and Scalable GPU-Accelerated Query Processing Platform Optimized for Efficient Data Movement

Online analytical processing of queries on datasets in the many-terabyte range is only possible with costly distributed computing systems. To decrease the cost and increase the throughput, systems can leverage accelerators such as GPUs,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-08 Felipe Aramburú , William Malpica , Kaouther Abrougui , Amin Aramoon , Romulo Auccapuclla , Claude Brisson , Matthijs Brobbel , Colby Farrell , Pradeep Garigipati , Joost Hoozemans , Supun Kamburugamuve , Akhil Nair , Alexander Ocsa , Johan Peltenburg , Rubén Quesada López , Deepak Sihag , Ahmet Uyar , Dhruv Vats , Michael Wendt , Jignesh M. Patel , Rodrigo Aramburú

On the Design and Analysis of Parallel and Distributed Algorithms

Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-13 Rajendra Purohit , K R Chowdhary , S D Purohit

One Ring to Shuffle Them All: Scalable Intra-Process Data Redistribution with Ring-Buffer Shuffle in Redpanda Oxla

As server CPUs scale to dozens and now hundreds of cores per socket, parallel query engines must rethink how they redistribute data between threads. Partitioned operators such as hash joins and aggregations require frequent data…

Databases · Computer Science 2026-05-29 Adam Szymański , Tyler Akidau

On the Performance of MPI-OpenMP on a 12 nodes Multi-core Cluster

With the increasing number of Quad-Core-based clusters and the introduction of compute nodes designed with large memory capacity shared by multiple cores, new problems related to scalability arise. In this paper, we analyze the overall…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-08-17 Abdelgadir Tageldin Abdelgadir , Al-Sakib Khan Pathan , Mohiuddin Ahmed