Related papers: Low-Depth Parallel Algorithms for the Binary-Forki…

Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model

In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and ordered set union, intersection and difference.…

Data Structures and Algorithms · Computer Science 2020-06-26 Guy E. Blelloch , Jeremy T. Fineman , Yan Gu , Yihan Sun

A Many-core Machine Model for Designing Algorithms with Minimum Parallelism Overheads

We present a model of multithreaded computation, combining fork-join and single-instruction-multiple-data parallelisms, with an emphasis on estimating parallelism overheads of programs written for modern many-core architectures. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-02-04 Sardar Anisul Haque , Marc Moreno Maza , Ning Xie

Parallel Joinable B-Trees in the Fork-Join I/O Model

Balanced search trees are widely used in computer science to efficiently maintain dynamic ordered data. To support efficient set operations (e.g., union, intersection, difference) using trees, the join-based framework is widely studied.…

Data Structures and Algorithms · Computer Science 2025-10-24 Michael Goodrich , Yan Gu , Ryuto Kitagawa , Yihan Sun

Data Oblivious Algorithms for Multicores

As secure processors such as Intel SGX (with hyperthreading) become widely adopted, there is a growing appetite for private analytics on big data. Most prior works on data-oblivious algorithms adopt the classical PRAM model to capture…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-01 Vijaya Ramachandran , Elaine Shi

Parallel training of linear models without compromising convergence

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks,…

Machine Learning · Computer Science 2018-12-20 Nikolas Ioannou , Celestine Dünner , Kornilios Kourtis , Thomas Parnell

Towards Work-Efficient Parallel Parameterized Algorithms

Parallel parameterized complexity theory studies how fixed-parameter tractable (fpt) problems can be solved in parallel. Previous theoretical work focused on parallel algorithms that are very fast in principle, but did not take into account…

Data Structures and Algorithms · Computer Science 2019-02-21 Max Bannach , Malte Skambath , Till Tantau

Algorithms in the Ultra-Wide Word Model

The effective use of parallel computing resources to speed up algorithms in current multi-core parallel architectures remains a difficult challenge, with ease of programming playing a key role in the eventual success of various parallel…

Data Structures and Algorithms · Computer Science 2014-12-09 Arash Farzan , Alejandro López-Ortiz , Patrick K. Nicholson , Alejandro Salinger

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and…

Machine Learning · Computer Science 2025-03-13 Ruifeng She , Bowen Pang , Kai Li , Zehua Liu , Tao Zhong

Parallel Minimum Spanning Tree Algorithms and Evaluation

Minimum Spanning Tree (MST) is an important graph algorithm that has wide ranging applications in the areas of computer networks, VLSI routing, wireless communications among others. Today virtually every computer is built out of multi-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-15 Suryanarayana Murthy Durbhakula

Primitives for Dynamic Big Model Parallelism

When training large machine learning models with many variables or parameters, a single machine is often inadequate since the model may be too large to fit in memory, while training can take a long time even with stochastic updates. A…

Machine Learning · Statistics 2014-06-19 Seunghak Lee , Jin Kyu Kim , Xun Zheng , Qirong Ho , Garth A. Gibson , Eric P. Xing

On the Design and Analysis of Parallel and Distributed Algorithms

Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-13 Rajendra Purohit , K R Chowdhary , S D Purohit

A Partition-insensitive Parallel Framework for Distributed Model Fitting

Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often…

Computation · Statistics 2024-06-04 Xiaofei Wu , Rongmei Liang , Fabio Roli , Marcello Pelillo , Jing Yuan

Effective Parallelisation for Machine Learning

We present a novel parallelisation scheme that simplifies the adaptation of learning algorithms to growing amounts of data as well as growing needs for accurate and confident predictions in critical applications. In contrast to other…

Machine Learning · Computer Science 2018-10-09 Michael Kamp , Mario Boley , Olana Missura , Thomas Gärtner

An Optimal Level-synchronous Shared-memory Parallel BFS Algorithm with Optimal parallel Prefix-sum Algorithm and its Implications for Energy Consumption

We present a work-efficient parallel level-synchronous Breadth First Search (BFS) algorithm for shared-memory architectures which achieves the theoretical lower bound on parallel running time. The optimality holds regardless of the shape of…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-20 Jesmin Jahan Tithi , Yonatan Fogel , Rezaul Chowdhury

Parallel Algorithms for Tensor Train Arithmetic

We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms…

Numerical Analysis · Mathematics 2021-09-08 Hussam Al Daas , Grey Ballard , Peter Benner

A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures

As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…

Mathematical Software · Computer Science 2008-06-12 Alfredo Buttari , Julien Langou , Jakub Kurzak , Jack Dongarra

An Experiment on Parallel Model Checking of a CTL Fragment

We propose a parallel algorithm for local, on the fly, model checking of a fragment of CTL that is well-suited for modern, multi-core architectures. This model-checking algorithm takes bene t from a parallel state space construction…

Logic in Computer Science · Computer Science 2013-02-01 Rodrigo Tacla Saad , Silvano Dal Zilio , Bernard Berthomieu

High-performance Kernel Machines with Implicit Distributed Optimization and Randomization

In order to fully utilize "big data", it is often required to use "big models". Such models tend to grow with the complexity and size of the training data, and do not make strong parametric assumptions upfront on the nature of the…

Machine Learning · Statistics 2015-04-17 Vikas Sindhwani , Haim Avron

Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism

Shared memory programming models usually provide worksharing and task constructs. The former relies on the efficient fork-join execution model to exploit structured parallelism; while the latter relies on fine-grained synchronization among…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-08 M. Maronas , K. Sala , S. Mateo , E. Ayguadé , V. Beltran Barcelona Supercomputing Center

Efficient Parallel and Out of Core Algorithms for Constructing Large Bi-directed de Bruijn Graphs

Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories -- based on the data structures which…

Data Structures and Algorithms · Computer Science 2010-03-10 Vamsi Kundeti , Sanguthevar Rajasekaran , Hieu Dinh