Related papers: Efficient sorting, duplicate removal, grouping, an…

Scalable Distributed-Memory External Sorting

We engineer algorithms for sorting huge data sets on massively parallel machines. The algorithms are based on the multiway merging paradigm. We first outline an algorithm whose I/O requirement is close to a lower bound. Thus, in contrast to…

Data Structures and Algorithms · Computer Science 2009-10-15 Mirko Rahn , Peter Sanders , Johannes Singler

Hash sort: A linear time complexity multiple-dimensional sort algorithm

Sorting and hashing are two completely different concepts in computer science, and appear mutually exclusive to one another. Hashing is a search method using the data as a key to map to the location within memory, and is used for rapid…

Data Structures and Algorithms · Computer Science 2007-05-23 William F. Gilreath

[Experiments \& Analysis] Hash-Based vs. Sort-Based Group-By-Aggregate: A Focused Empirical Study [Extended Version]

Group-by-aggregate (GBA) queries are integral to data analysis, allowing users to group data by specific attributes and apply aggregate functions such as sum, average, and count. Database Management Systems (DBMSs) typically execute GBA…

Databases · Computer Science 2024-12-03 Gaurav Vaghasiya , Shiva Jahangiri

Global Hash Tables Strike Back! An Analysis of Parallel GROUP BY Aggregation

Efficiently computing group aggregations (i.e., GROUP BY) on modern architectures is critical for analytic database systems. Hash-based approaches in today's engines predominantly use a partitioned approach, in which incoming data is…

Databases · Computer Science 2025-09-08 Daniel Xue , Ryan Marcus

Iterative Optimization and Simplification of Hierarchical Clusterings

Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search…

Artificial Intelligence · Computer Science 2014-11-17 D. Fisher

Offset-value coding in database query processing

Recent work shows how offset-value coding speeds up database query execution, not only sorting but also duplicate removal and grouping (aggregation) in sorted streams, order-preserving exchange (shuffle), merge join, and more. It already…

Databases · Computer Science 2023-02-20 Goetz Graefe , Thanh Do

A Survey of Distributed Data Aggregation Algorithms

Distributed data aggregation is an important task, allowing the decentralized determination of meaningful global properties, that can then be used to direct the execution of other applications. The resulting values result from the…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-10-05 Paulo Jesus , Carlos Baquero , Paulo Sérgio Almeida

Implementing the Comparison-Based External Sort

In the age of big data, sorting is an indispensable operation for DBMSes and similar systems. Having data sorted can help produce query plans with significantly lower run times. It also can provide other benefits like having non-blocking…

Databases · Computer Science 2022-07-27 Michael Polyntsov , Valentin Grigorev , Kirill Smirnov , George Chernishev

Rahmani Sort: A Novel Variant of Insertion Sort Algorithm with O(nlogn) Complexity

Various decision support systems are available that implement Data Mining and Data Warehousing techniques for diving into the sea of data for getting useful patterns of knowledge (pearls). Classification, regression, clustering, and many…

Cryptography and Security · Computer Science 2024-03-01 Mohammad Khalid Imam Rahmani

An Aggregate and Iterative Disaggregate Algorithm with Proven Optimality in Machine Learning

We propose a clustering-based iterative algorithm to solve certain optimization problems in machine learning, where we start the algorithm by aggregating the original data, solving the problem on aggregated data, and then in subsequent…

Machine Learning · Statistics 2017-01-23 Young Woong Park , Diego Klabjan

Accelerating Big-Data Sorting Through Programmable Switches

Sorting is a fundamental and well studied problem that has been studied extensively. Sorting plays an important role in the area of databases, as many queries can be served much faster if the relations are first sorted. One of the most…

Databases · Computer Science 2021-03-29 Yamit Barshatz-Schneor , Roy Friedman

An efficient sorting algorithm - Ultimate Heapsort(UHS)

Motivated by the development of computer theory, the sorting algorithm is emerging in an endless stream. Inspired by decrease and conquer method, we propose a brand new sorting algorithmUltimately Heapsort. The algorithm consists of two…

Data Structures and Algorithms · Computer Science 2019-02-04 Feiyang Chen , Nan Chen , Hanyang Mao , Hanlin Hu

Stream Aggregation Through Order Sampling

This is paper introduces a new single-pass reservoir weighted-sampling stream aggregation algorithm, Priority-Based Aggregation (PBA). While order sampling is a powerful and e cient method for weighted sampling from a stream of uniquely…

Data Structures and Algorithms · Computer Science 2017-11-02 Nick Duffield , Yunhong Xu , Liangzhen Xia , Nesreen Ahmed , Minlan Yu

The Adaptive Priority Queue with Elimination and Combining

Priority queues are fundamental abstract data structures, often used to manage limited resources in parallel programming. Several proposed parallel priority queue implementations are based on skiplists, harnessing the potential for…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-06 Irina Calciu , Hammurabi Mendes , Maurice Herlihy

Histogram Sort with Sampling

To minimize data movement, state-of-the-art parallel sorting algorithms use techniques based on sampling and histogramming to partition keys prior to redistribution. Sampling enables partitioning to be done using a representative subset of…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-30 Vipul Harsh , Laxmikant Kale , Edgar Solomonik

Sort Race

Sorting is one of the oldest computing problems and is still very important in the age of big data. Various algorithms and implementation techniques have been proposed. In this study, we focus on comparison based, internal sorting…

Data Structures and Algorithms · Computer Science 2016-09-16 Hantao Zhang , Baoluo Meng , Yiwen Liang

Memory-Efficient Group-by Aggregates over Multi-Way Joins

Aggregate computation in relational databases has long been done using the standard unary aggregation and binary join operators. These implement the classical model of computing joins between relations two at a time, materializing the…

Databases · Computer Science 2019-06-18 Konstantinos Xirogiannopoulos , Amol Deshpande

Efficient techniques for mining spatial databases

Clustering is one of the major tasks in data mining. In the last few years, Clustering of spatial data has received a lot of research attention. Spatial databases are components of many advanced information systems like geographic…

Databases · Computer Science 2012-06-04 Mohamed A. El-Zawawy

Aggregation and Ordering in Factorised Databases

A common approach to data analysis involves understanding and manipulating succinct representations of data. In earlier work, we put forward a succinct representation system for relational data called factorised databases and reported on…

Databases · Computer Science 2013-07-02 Nurzhan Bakibayev , Tomáš Kočiský , Dan Olteanu , Jakub Závodný

Communication-Efficient String Sorting

There has been surprisingly little work on algorithms for sorting strings on distributed-memory parallel machines. We develop efficient algorithms for this problem based on the multi-way merging principle. These algorithms inspect only…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-24 Timo Bingmann , Peter Sanders , Matthias Schimek