Related papers: Scalable Distributed-Memory External Sorting
Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and…
There has been surprisingly little work on algorithms for sorting strings on distributed-memory parallel machines. We develop efficient algorithms for this problem based on the multi-way merging principle. These algorithms inspect only…
We investigate distributed memory parallel sorting algorithms that scale to the largest available machines and are robust with respect to input size and distribution of the input elements. The main outcome is that four sorting algorithms…
In the age of big data, sorting is an indispensable operation for DBMSes and similar systems. Having data sorted can help produce query plans with significantly lower run times. It also can provide other benefits like having non-blocking…
String sorting is an important part of tasks such as building index data structures. Unfortunately, current string sorting algorithms do not scale to massively parallel distributed-memory machines since they either have latency (at least)…
Sorting is a fundamental and well studied problem that has been studied extensively. Sorting plays an important role in the area of databases, as many queries can be served much faster if the relations are first sorted. One of the most…
External sorting is at the core of many operations in large-scale database systems, such as ordering and aggregation queries for large result sets, building indexes, sort-merge joins, duplicate removal, sharding, and record clustering.…
Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external…
Edge-centric distributed computations have appeared as a recent technique to improve the shortcomings of think-like-a-vertex algorithms on large scale-free networks. In order to increase parallelism on this model, edge partitioning -…
Merging-based sorting networks are an important family of sorting networks. Most merge sorting networks are based on 2-way or multi-way merging algorithms using 2-sorters as basic building blocks. An alternative is to use n-sorters, instead…
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation in processing graphs. Recently, size, variety, and structural complexity of these networks has grown dramatically.…
Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques…
Sorting is a fundamental operation across numerous computational domains. Traditionally, this process involves transferring data from main memory to a processing unit for sorting, followed by writing the sorted data back to memory. This…
This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on shared-memory multi-core machines, the second part external memory suffix sorting using…
We propose new sequential sorting operations by adapting techniques and methods used for designing parallel sorting algorithms. Although the norm is to parallelize a sequential algorithm to improve performance, we adapt a contrarian…
In this paper, we present several improvements in the parallelization of the in-place merge algorithm, which merges two contiguous sorted arrays into one with an O(T) space complexity (where T is the number of threads). The approach divides…
Emerging memory technologies have a significant gap between the cost, both in time and in energy, of writing to memory versus reading from memory. In this paper we present models and algorithms that account for this difference, with a focus…
Sorting and hashing are two completely different concepts in computer science, and appear mutually exclusive to one another. Hashing is a search method using the data as a key to map to the location within memory, and is used for rapid…
Sorting is one of the most fundamental problems in the field of computer science. With the rapid development of manycore processors, it shows great importance to design efficient parallel sort algorithm on manycore architecture. This paper…
Clustering is an unsupervised learning technique in which data or objects are grouped into sets based on some similarity measure. Most of the clustering algorithms assume that the main memory is infinite and can accommodate the set of…