Related papers: Towards Parallel Learned Sorting
We present a sorting algorithm that works in-place, executes in parallel, is cache-efficient, avoids branch-mispredictions, and performs work O(n log n) for arbitrary inputs with high probability. The main algorithmic contributions are new…
This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is…
We present sorting algorithms that represent the fastest known techniques for a wide range of input sizes, input distributions, data types, and machines. A part of the speed advantage is due to the feature to work in-place. Previously, the…
We propose new sequential sorting operations by adapting techniques and methods used for designing parallel sorting algorithms. Although the norm is to parallelize a sequential algorithm to improve performance, we adapt a contrarian…
Sorting is one of the fundamental problems in computer science. Playing a role in many processes, it has a lower complexity bound imposed by $\mathcal{O}(n\log{n})$ when executing on a sequential machine. This limit can be brought down to…
Sorting is one of the most fundamental problems in the field of computer science. With the rapid development of manycore processors, it shows great importance to design efficient parallel sort algorithm on manycore architecture. This paper…
Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and…
We investigate distributed memory parallel sorting algorithms that scale to the largest available machines and are robust with respect to input size and distribution of the input elements. The main outcome is that four sorting algorithms…
The increasingly deeper neural networks hinder the democratization of privacy-enhancing distributed learning, such as federated learning (FL), to resource-constrained devices. To overcome this challenge, in this paper, we advocate the…
Built upon the decision tree (DT) classification and regression idea, the subspace learning machine (SLM) has been recently proposed to offer higher performance in general classification and regression tasks. Its performance improvement is…
We present four high performance hybrid sorting methods developed for various parallel platforms: shared memory multiprocessors, distributed multiprocessors, and clusters taking advantage of existence of both shared and distributed memory.…
To minimize data movement, state-of-the-art parallel sorting algorithms use techniques based on sampling and histogramming to partition keys prior to redistribution. Sampling enables partitioning to be done using a representative subset of…
Parallel split learning (PSL) suffers from two intertwined issues: the effective batch size grows with the number of clients, and data that is not identically and independently distributed (non-IID) skews global batches. We present parallel…
The advent of efficient interior point optimization methods has enabled the tractable solution of large-scale linear and nonlinear programming (NLP) problems. A prominent example of such a method is seen in Ipopt, a widely-used, open-source…
This work aims to improve the sample efficiency of parallel large-scale ranking and selection (R&S) problems by leveraging correlation information. We modify the commonly used "divide and conquer" framework in parallel computing by adding a…
The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…
Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often…
We propose an $O(N\cdot M)$ sorting algorithm by Machine Learning method, which shows a huge potential sorting big data. This sorting algorithm can be applied to parallel sorting and is suitable for GPU or TPU acceleration. Furthermore, we…
Most machine learning and deep neural network algorithms rely on certain iterative algorithms to optimise their utility/cost functions, e.g. Stochastic Gradient Descent. In distributed learning, the networked nodes have to work…
Sorting is one of the most basic algorithms, and developing highly parallel sorting programs is becoming increasingly important in high-performance computing because the number of CPU cores per node in modern supercomputers tends to…