Related papers: Communication-Efficient String Sorting
String sorting is an important part of tasks such as building index data structures. Unfortunately, current string sorting algorithms do not scale to massively parallel distributed-memory machines since they either have latency (at least)…
We engineer algorithms for sorting huge data sets on massively parallel machines. The algorithms are based on the multiway merging paradigm. We first outline an algorithm whose I/O requirement is close to a lower bound. Thus, in contrast to…
We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we…
We investigate distributed memory parallel sorting algorithms that scale to the largest available machines and are robust with respect to input size and distribution of the input elements. The main outcome is that four sorting algorithms…
We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we…
Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and…
Counting triangles in a graph and incident to each vertex is a fundamental and frequently considered task of graph analysis. We consider how to efficiently do this for huge graphs using massively parallel distributed-memory machines.…
We present scalable parallel algorithms with sublinear per-processor communication volume and low latency for several fundamental problems related to finding the most relevant elements in a set, for various notions of relevance: We begin…
Bitmap indexes must be compressed to reduce input/output costs and minimize CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid (WAH)…
Sorting is one of the most fundamental problems in the field of computer science. With the rapid development of manycore processors, it shows great importance to design efficient parallel sort algorithm on manycore architecture. This paper…
Upcoming many core processors are expected to employ a distributed memory architecture similar to currently available supercomputers, but parallel pattern mining algorithms amenable to the architecture are not comprehensively studied. We…
Sorting is one of the most basic algorithms, and developing highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to…
Merging two sorted arrays is a prominent building block for sorting and other functions. Its efficient parallelization requires balancing the load among compute cores, minimizing the extra work brought about by parallelization, and…
This note makes an observation that significantly simplifies a number of previous parallel, two-way merge algorithms based on binary search and sequential merge in parallel. First, it is shown that the additional merge step of distinguished…
Sorting algorithms are the deciding factor for the performance of common operations such as removal of duplicates or database sort-merge joins. This work focuses on 32-bit integer keys, optionally paired with a 32-bit value. We present a…
This thesis is concerned with the design of distributed algorithms for solving optimization problems. We consider networks where each node has exclusive access to a cost function, and design algorithms that make all nodes cooperate to find…
Many parallel algorithms which solve basic problems in computer science use auxiliary space linear in the input to facilitate conflict-free computation. There has been significant work on improving these parallel algorithms to be in-place,…
We study the scalability of consensus-based distributed optimization algorithms by considering two questions: How many processors should we use for a given problem, and how often should they communicate when communication is not free?…
In-place associative integer sorting technique was proposed for integer lists which requires only constant amount of additional memory replacing bucket sort, distribution counting sort and address calculation sort family of algorithms. The…
This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on shared-memory multi-core machines, the second part external memory suffix sorting using…