Related papers: Faster Radix Sort via Virtual Memory and Write-Com…
Sorting is at the core of many database operations, such as index creation, sort-merge joins, and user-requested output sorting. As GPUs are emerging as a promising platform to accelerate various operations, sorting on GPUs becomes a viable…
Sorting is a fundamental operation across numerous computational domains. Traditionally, this process involves transferring data from main memory to a processing unit for sorting, followed by writing the sorted data back to memory. This…
Sorting is an essential operation in computer science with direct consequences on the performance of large scale data systems, real-time systems, and embedded computation. However, no sorting algorithm is optimal under all distributions of…
Cloud database systems, particularly their middleware and query execution layers, use sorting as a core operation in query processing, indexing and join execution. Distribution-dependence and limited parallelism are key issues inherent in…
There have been many proposals for sorting integers on multicores/GPUs that include radix-sort and its variants or other approaches that exploit specialized hardware features of a particular multicore architecture. Comparison-based…
In this paper, we present the design of a sample sort algorithm for manycore GPUs. Despite being one of the most efficient comparison-based sorting algorithms for distributed memory architectures its performance on GPUs was previously…
In this paper we introduce RADULS2, the fastest parallel sorter based on radix algorithm. It is optimized to process huge amounts of data making use of modern multicore CPUs. The main novelties include: extremely optimized algorithm for…
This paper describes in detail the bitonic sort algorithm,and implements the bitonic sort algorithm based on cuda architecture.At the same time,we conduct two effective optimization of implementation details according to the characteristics…
Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH)…
Bucket sort and RADIX sort are two well-known integer sorting algorithms. This paper measures empirically what is the time usage and memory consumption for different kinds of input sequences. The algorithms are compared both from a…
We present four high performance hybrid sorting methods developed for various parallel platforms: shared memory multiprocessors, distributed multiprocessors, and clusters taking advantage of existence of both shared and distributed memory.…
We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the…
We investigate distributed memory parallel sorting algorithms that scale to the largest available machines and are robust with respect to input size and distribution of the input elements. The main outcome is that four sorting algorithms…
Sorting is needed in many application domains. The data is read from memory and sent to a general purpose processor or application specific hardware for sorting. The sorted data is then written back to the memory. Reading/writing data…
Smart Sort algorithm is a "smart" fusion of heap construction procedures (of Heap sort algorithm) into the conventional "Partition" function (of Quick sort algorithm) resulting in a robust version of Quick sort algorithm. We have also…
Memristive in-memory sorting has been proposed recently to improve hardware sorting efficiency. Using iterative in-memory min computations, data movements between memory and external processing units can be eliminated for improved latency…
Integer sorting on multicores and GPUs can be realized by a variety of approaches that include variants of distribution-based methods such as radix-sort, comparison-oriented algorithms such as deterministic regular sampling and random…
We engineer algorithms for sorting huge data sets on massively parallel machines. The algorithms are based on the multiway merging paradigm. We first outline an algorithm whose I/O requirement is close to a lower bound. Thus, in contrast to…
Sorting is the one of the fundamental tasks of modern data management systems. With Disk I/O being the most-accused performance bottleneck and more computation-intensive workloads, it has come to our attention that in heterogeneous…
Sorting is one of the most basic algorithms, and developing highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to…