Related papers: Towards Parallel Learned Sorting

In-place Parallel Super Scalar Samplesort (IPS$^4$o)

We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability. The main algorithmic contributions are new…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-07-03 Michael Axtmann , Sascha Witt , Daniel Ferizovic , Peter Sanders

LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization

This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is…

Machine Learning · Computer Science 2023-08-30 Ivan Carvalho , Ramon Lawrence

Engineering In-place (Shared-memory) Sorting Algorithms

We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-05 Michael Axtmann , Sascha Witt , Daniel Ferizovic , Peter Sanders

Using parallelism techniques to improve sequential and multi-core sorting performance

We propose new sequential sorting operations by adapting techniques and methods used for designing parallel sorting algorithms. Although the norm is to parallelize a sequential algorithm to improve performance, we adapt a contrarian…

Data Structures and Algorithms · Computer Science 2016-09-01 Alexandros V Gerbessiotis

Hourglass Sorting: A novel parallel sorting algorithm and its implementation

Sorting is one of the fundamental problems in computer science. Playing a role in many processes, it has a lower complexity bound imposed by $\mathcal{O}(n\log{n})$ when executing on a sequential machine. This limit can be brought down to…

Hardware Architecture · Computer Science 2025-07-23 Daniel Bascones , Borja Morcillo

A Creativity Survey of Parallel Sorting Algorithm

Sorting is one of the most fundamental problems in the field of computer science. With the rapid development of manycore processors, it shows great importance to design efficient parallel sort algorithm on manycore architecture. This paper…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-18 Tianyi Yu , Wei Li

Practical Massively Parallel Sorting

Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and…

Data Structures and Algorithms · Computer Science 2015-02-26 Michael Axtmann , Timo Bingmann , Peter Sanders , Christian Schulz

Robust Massively Parallel Sorting

We investigate distributed memory parallel sorting algorithms that scale to the largest available machines and are robust with respect to input size and distribution of the input elements. The main outcome is that four sorting algorithms…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-17 Michael Axtmann , Peter Sanders

Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks

The increasingly deeper neural networks hinder the democratization of privacy-enhancing distributed learning, such as federated learning (FL), to resource-constrained devices. To overcome this challenge, in this paper, we advocate the…

Machine Learning · Computer Science 2024-01-25 Zheng Lin , Guangyu Zhu , Yiqin Deng , Xianhao Chen , Yue Gao , Kaibin Huang , Yuguang Fang

Acceleration of Subspace Learning Machine via Particle Swarm Optimization and Parallel Processing

Built upon the decision tree (DT) classification and regression idea, the subspace learning machine (SLM) has been recently proposed to offer higher performance in general classification and regression tasks. Its performance improvement is…

Machine Learning · Computer Science 2022-08-16 Hongyu Fu , Yijing Yang , Yuhuai Liu , Joseph Lin , Ethan Harrison , Vinod K. Mishra , C. -C. Jay Kuo

High Performance Parallel Sort for Shared and Distributed Memory MIMD

We present four high performance hybrid sorting methods developed for various parallel platforms: shared memory multiprocessors, distributed multiprocessors, and clusters taking advantage of existence of both shared and distributed memory.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-04 Thoria Alghamdi , Gita Alaghband

Histogram Sort with Sampling

To minimize data movement, state-of-the-art parallel sorting algorithms use techniques based on sampling and histogramming to partition keys prior to redistribution. Sampling enables partitioning to be done using a representative subset of…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-30 Vipul Harsh , Laxmikant Kale , Edgar Solomonik

Parallel Split Learning with Global Sampling

Parallel split learning (PSL) suffers from two intertwined issues: the effective batch size grows with the number of clients, and data that is not identically and independently distributed (non-IID) skews global batches. We present parallel…

Machine Learning · Computer Science 2026-03-06 Mohammad Kohankhaki , Ahmad Ayad , Mahdi Barhoush , Anke Schmeink

Exploring Benefits of Linear Solver Parallelism on Modern Nonlinear Optimization Applications

The advent of efficient interior point optimization methods has enabled the tractable solution of large-scale linear and nonlinear programming (NLP) problems. A prominent example of such a method is seen in Ipopt, a widely-used, open-source…

Optimization and Control · Mathematics 2019-09-19 Byron Tasseff , Carleton Coffrin , Andreas Wächter , Carl Laird

Sample-Efficient "Clustering and Conquer" Procedures for Parallel Large-Scale Ranking and Selection

This work aims to improve the sample efficiency of parallel large-scale ranking and selection (R&S) problems by leveraging correlation information. We modify the commonly used "divide and conquer" framework in parallel computing by adding a…

Methodology · Statistics 2026-02-16 Zishi Zhang , Yijie Peng

Parallelizing the Approximate Minimum Degree Ordering Algorithm: Strategies and Evaluation

The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-26 Yen-Hsiang Chang , Aydın Buluç , James Demmel

A Partition-insensitive Parallel Framework for Distributed Model Fitting

Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often…

Computation · Statistics 2024-06-04 Xiaofei Wu , Rongmei Liang , Fabio Roli , Marcello Pelillo , Jing Yuan

An $O(N)$ Sorting Algorithm: Machine Learning Sort

We propose an $O(N\cdot M)$ sorting algorithm by Machine Learning method, which shows a huge potential sorting big data. This sorting algorithm can be applied to parallel sorting and is suitable for GPU or TPU acceleration. Furthermore, we…

Machine Learning · Computer Science 2018-08-16 Hanqing Zhao , Yuehan Luo

Probabilistic Synchronous Parallel

Most machine learning and deep neural network algorithms rely on certain iterative algorithms to optimise their utility/cost functions, e.g. Stochastic Gradient Descent. In distributed learning, the networked nodes have to work…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-06 Liang Wang , Ben Catterall , Richard Mortier

Performance Evaluation of Parallel Sortings on the Supercomputer Fugaku

Sorting is one of the most basic algorithms, and developing highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-08 Tomoyuki Tokuue , Tomoaki Ishiyama