English
Related papers

Related papers: Exoshuffle-CloudSort

200 papers

Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithic shuffle systems. These systems are…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-21 Frank Sifei Luan , Stephanie Wang , Samyukta Yagati , Sean Kim , Kenneth Lien , Isaac Ong , Tony Hong , SangBin Cho , Eric Liang , Ion Stoica

The paper introduces RADULS, a new parallel sorter based on radix sort algorithm, intended to organize ultra-large data sets efficiently. For example 4G 16-byte records can be sorted with 16 threads in less than 15 seconds on Intel…

Data Structures and Algorithms · Computer Science 2016-12-09 Marek Kokot , Sebastian Deorowicz , Agnieszka Debudaj-Grabysz

We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named Coded TeraSort, which substantially improves the execution time of the TeraSort benchmark in…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-17 Songze Li , Sucha Supittayapornpong , Mohammad Ali Maddah-Ali , A. Salman Avestimehr

Cloud computing is a powerful new technology that is widely used in the business world. Recently, we have been investigating the benefits it offers to scientific computing. We have used three workflow applications to compare the performance…

Instrumentation and Methods for Astrophysics · Physics 2015-03-17 G. Bruce Berriman , Ewa Deelman , Gideon Juve , Moira Regelson , Peter Plavchan

The granularity of distributed computing is limited by communication time: there is no point in farming out smaller and smaller tasks if the communication overhead dominates the decrease in processing time due to the added parallelism. In…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-28 Theo Jepsen , Stephen Ibanez , Gregory Valiant , Nick McKeown

Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data,…

Shuffle exchanges intermediate results between upstream and downstream operators in distributed data processing and is usually the bottleneck due to factors such as small random I/Os and network contention. Several systems have been…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-27 Yuhao Lin , Zhipeng Tang , Jiayan Tong , Junqing Xiao , Bin Lu , Yuhang Li , Chao Li , Zhiguo Zhang , Junhua Wang , Hao Luo , James Cheng , Chuang Hu , Jiawei Jiang , Xiao Yan

Sorting is at the core of many database operations, such as index creation, sort-merge joins, and user-requested output sorting. As GPUs are emerging as a promising platform to accelerate various operations, sorting on GPUs becomes a viable…

Databases · Computer Science 2017-05-22 Elias Stehle , Hans-Arno Jacobsen

We present WiscSort, a new approach to high-performance concurrent sorting for existing and future byte-addressable storage (BAS) devices. WiscSort carefully reduces writes, exploits random reads by splitting keys and values during sorting,…

Today's cloud storage services must offer storage reliability and fast data retrieval for large amount of data without sacrificing storage cost. We present SEARS, a cloud-based storage system which integrates erasure coding and data…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-18 Ying Li , Katherine Guo , Xin Wang , Emina Soljanin , Thomas Woo

Sorting is a foundational primitive in modern data processing, influencing the execution speed of high-performance data pipelines. However, the algorithmic landscape is currently bifurcated by a pervasive "Stability Tax": practitioners must…

Data Structures and Algorithms · Computer Science 2026-05-15 Hriday Jain , Ketan Sabale , Aditya Shastri , Hiren Kumar Thakkar , Ashutosh Londhe

NTsort is an external sort on WindowsNT 5.0. It has minimal functionality but excellent price performance. In particular, running on mail-order hardware it can sort 1.5 GB for a penny. For commercially available sorts, Postman Sort from…

Databases · Computer Science 2007-05-23 Jim Gray , Joshua Coates , Chris Nyberg

Our paper presents solutions that can significantly improve the delay performance of putting and retrieving data in and out of cloud storage. We first focus on measuring the delay performance of a very popular cloud storage service Amazon…

Networking and Internet Architecture · Computer Science 2013-11-04 Guanfeng Liang , Ulas C. Kozat

In this paper we present TSSort, a probabilistic, noise resistant, quickly converging comparison sort algorithm based on Microsoft TrueSkill. The algorithm combines TrueSkill's updating rules with a newly developed next item pair selection…

Data Structures and Algorithms · Computer Science 2016-06-17 Jörn Hees , Benjamin Adrian , Ralf Biedert , Thomas Roth-Berghofer , Andreas Dengel

External sorting is at the core of many operations in large-scale database systems, such as ordering and aggregation queries for large result sets, building indexes, sort-merge joins, duplicate removal, sharding, and record clustering.…

Databases · Computer Science 2023-05-11 Ani Kristo , Tim Kraska

We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-05 Michael Axtmann , Sascha Witt , Daniel Ferizovic , Peter Sanders

When orchestrating highly distributed and data-intensive Web service workflows the geographical placement of the orchestration engine can greatly affect the overall performance of a workflow. Orchestration engines are typically run from…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-02-04 Michael Luckeneder , Adam Barker

Sorting algorithms are the deciding factor for the performance of common operations such as removal of duplicates or database sort-merge joins. This work focuses on 32-bit integer keys, optionally paired with a 32-bit value. We present a…

Data Structures and Algorithms · Computer Science 2010-09-07 Jan Wassenberg , Peter Sanders

Cloud database systems, particularly their middleware and query execution layers, use sorting as a core operation in query processing, indexing and join execution. Distribution-dependence and limited parallelism are key issues inherent in…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-13 Michael Dang'ana

We present EvoSort, a general-purpose adaptive parallel parallel sorting framework accessible at the Python level. EvoSort employs a Genetic Algorithm (GA) to automatically discover and refine critical parameters, including insertion sort…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-05 Shashank Raj , Kalyanmoy Deb
‹ Prev 1 2 3 10 Next ›