Related papers: Accelerating Big-Data Sorting Through Programmable…

Scalable Distributed String Sorting

String sorting is an important part of tasks such as building index data structures. Unfortunately, current string sorting algorithms do not scale to massively parallel distributed-memory machines since they either have latency (at least)…

Data Structures and Algorithms · Computer Science 2024-04-26 Florian Kurpicz , Pascal Mehnert , Peter Sanders , Matthias Schimek

Scalable Distributed-Memory External Sorting

We engineer algorithms for sorting huge data sets on massively parallel machines. The algorithms are based on the multiway merging paradigm. We first outline an algorithm whose I/O requirement is close to a lower bound. Thus, in contrast to…

Data Structures and Algorithms · Computer Science 2009-10-15 Mirko Rahn , Peter Sanders , Johannes Singler

Engineering Faster Sorters for Small Sets of Items

Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. That is why a lot of effort has been put into finding sorting algorithms that sort large sets as fast as possible. But the…

Data Structures and Algorithms · Computer Science 2020-10-05 Timo Bingmann , Jasper Marianczuk , Peter Sanders

Engineering Faster Sorters for Small Sets of Items

Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. The more sophisticated and fast sorting algorithms become asymptotically, the less efficient they are for small sets of items…

Data Structures and Algorithms · Computer Science 2019-08-23 Jasper Marianczuk

Implementing the Comparison-Based External Sort

In the age of big data, sorting is an indispensable operation for DBMSes and similar systems. Having data sorted can help produce query plans with significantly lower run times. It also can provide other benefits like having non-blocking…

Databases · Computer Science 2022-07-27 Michael Polyntsov , Valentin Grigorev , Kirill Smirnov , George Chernishev

Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques…

Databases · Computer Science 2012-07-03 Martina-Cezara Albutiu , Alfons Kemper , Thomas Neumann

Hash sort: A linear time complexity multiple-dimensional sort algorithm

Sorting and hashing are two completely different concepts in computer science, and appear mutually exclusive to one another. Hashing is a search method using the data as a key to map to the location within memory, and is used for rapid…

Data Structures and Algorithms · Computer Science 2007-05-23 William F. Gilreath

Random Shuffling to Reduce Disorder in Adaptive Sorting Scheme

In this paper we present a random shuffling scheme to apply with adaptive sorting algorithms. Adaptive sorting algorithms utilize the presortedness present in a given sequence. We have probabilistically increased the amount of presortedness…

Data Structures and Algorithms · Computer Science 2016-08-31 Md. Enamul Karim , Abdun Naser Mahmood

Histogram Sort with Sampling

To minimize data movement, state-of-the-art parallel sorting algorithms use techniques based on sampling and histogramming to partition keys prior to redistribution. Sampling enables partitioning to be done using a representative subset of…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-30 Vipul Harsh , Laxmikant Kale , Edgar Solomonik

Deep Learning Service for Efficient Data Distribution Aware Sorting

In this paper, we present a neural network-enabled data distribution aware sorting method, coined as NN-sort. Our approach explores the potential of developing deep learning techniques to speed up large-scale sort operations, enabling data…

Data Structures and Algorithms · Computer Science 2024-12-16 Xiaoke Zhu , Qi Zhang , Wei Zhou , Ling Liu

High-Performance and Flexible Parallel Algorithms for Semisort and Related Problems

Semisort is a fundamental algorithmic primitive widely used in the design and analysis of efficient parallel algorithms. It takes input as an array of records and a function extracting a \emph{key} per record, and reorders them so that…

Data Structures and Algorithms · Computer Science 2023-04-21 Xiaojun Dong , Yunshu Wu , Zhongqi Wang , Laxman Dhulipala , Yan Gu , Yihan Sun

Methods for Partitioning Data to Improve Parallel Execution Time for Sorting on Heterogeneous Clusters

The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-16 Christophe Cérin , Jean-Christophe Dubacq , Jean-Louis Roch , the SafeScale Collaboration

Practical Massively Parallel Sorting

Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and…

Data Structures and Algorithms · Computer Science 2015-02-26 Michael Axtmann , Timo Bingmann , Peter Sanders , Christian Schulz

Efficient sorting, duplicate removal, grouping, and aggregation

Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external…

Databases · Computer Science 2022-09-27 Thanh Do , Goetz Graefe , Jeffrey Naughton

Sorting it out in Hardware: A State-of-the-Art Survey

Sorting is a fundamental operation in various applications and a traditional research topic in computer science. Improving the performance of sorting operations can have a significant impact on many application domains. For high-performance…

Hardware Architecture · Computer Science 2023-10-13 Amir Hossein Jalilvand , Faeze S. Banitaba , Seyedeh Newsha Estiri , Sercan Aygun , M. Hassan Najafi

Distributed Rate Scaling in Large-Scale Service Systems

We consider a large-scale parallel-server system, where each server independently adjusts its processing speed in a decentralized manner. The objective is to minimize the overall cost, which comprises the average cost of maintaining the…

Optimization and Control · Mathematics 2023-06-06 Daan Rutten , Martin Zubeldia , Debankur Mukherjee

List Sort: A New Approach for Sorting List to Reduce Execution Time

In this paper we are proposing a new sorting algorithm, List Sort algorithm, is based on the dynamic memory allocation. In this research study we have also shown the comparison of various efficient sorting techniques with List sort. Due the…

Data Structures and Algorithms · Computer Science 2013-10-30 Adarsh Kumar Verma , Prashant Kumar

An Optimized Disk Scheduling Algorithm With Bad-Sector Management

In high performance computing, researchers try to optimize the CPU Scheduling algorithms, for faster and efficient working of computers. But a process needs both CPU bound and I/O bound for completion of its execution. With modernization of…

Operating Systems · Computer Science 2019-08-06 Amar Ranjan Dash , Sandipta Kumar Sahu , B Kewal

Parallelizing Query Optimization on Shared-Nothing Architectures

Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query…

Databases · Computer Science 2015-11-06 Immanuel Trummer , Christoph Koch

Memory-Based Multi-Processing Method For Big Data Computation

The evolution of the Internet and computer applications have generated colossal amount of data. They are referred to as Big Data and they consist of huge volume, high velocity, and variable datasets that need to be managed at the right…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-13 Youssef Bassil