Related papers: Same Engine, Multiple Gears: Parallelizing Fixpoin…

Parallel training of linear models without compromising convergence

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks,…

Machine Learning · Computer Science 2018-12-20 Nikolas Ioannou , Celestine Dünner , Kornilios Kourtis , Thomas Parnell

Towards Work-Efficient Parallel Parameterized Algorithms

Parallel parameterized complexity theory studies how fixed-parameter tractable (fpt) problems can be solved in parallel. Previous theoretical work focused on parallel algorithms that are very fast in principle, but did not take into account…

Data Structures and Algorithms · Computer Science 2019-02-21 Max Bannach , Malte Skambath , Till Tantau

A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems

Harnessing parallelism in seemingly sequential models is a central challenge for modern machine learning. Several approaches have been proposed for evaluating sequential processes in parallel using iterative fixed-point methods, like…

Machine Learning · Computer Science 2026-04-06 Xavier Gonzalez , E. Kelly Buchanan , Hyun Dong Lee , Jerry Weihong Liu , Ke Alexander Wang , David M. Zoltowski , Leo Kozachkov , Christopher Ré , Scott W. Linderman

Parallel Algorithms for Core Maintenance in Dynamic Graphs

This paper initiates the studies of parallel algorithms for core maintenance in dynamic graphs. The core number is a fundamental index reflecting the cohesiveness of a graph, which are widely used in large-scale graph analytics. The core…

Data Structures and Algorithms · Computer Science 2017-01-02 Na Wang , Dongxiao Yu , Hai Jin , Chen Qian , Xia Xie , Qiang-Sheng Hua

An Easy-to-use Scalable Framework for Parallel Recursive Backtracking

Supercomputers are equipped with an increasingly large number of cores to use computational power as a way of solving problems that are otherwise intractable. Unfortunately, getting serial algorithms to run in parallel to take advantage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-31 Faisal N. Abu-Khzam , Khuzaima Daudjee , Amer E. Mouawad , Naomi Nishimura

Parallelizing non-linear sequential models over the sequence length

Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought…

Machine Learning · Computer Science 2024-01-17 Yi Heng Lim , Qi Zhu , Joshua Selfridge , Muhammad Firmansyah Kasim

A Parallel and Efficient Algorithm for Learning to Match

Many tasks in data mining and related fields can be formalized as matching between objects in two heterogeneous domains, including collaborative filtering, link prediction, image tagging, and web search. Machine learning techniques,…

Machine Learning · Computer Science 2014-10-24 Jingbo Shang , Tianqi Chen , Hang Li , Zhengdong Lu , Yong Yu

Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs

Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1) generation of fault-span, the set of states reachable in the presence of faults, and (2) resolving deadlock…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-12-15 Fuad Abujarad , Borzoo Bonakdarpour , Sandeep S. Kulkarni

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models. However, how to train these models over multiple GPUs…

Machine Learning · Computer Science 2022-11-28 Xupeng Miao , Yujie Wang , Youhe Jiang , Chunan Shi , Xiaonan Nie , Hailin Zhang , Bin Cui

Parallelisation of a Common Changepoint Detection Method

In recent years, various means of efficiently detecting changepoints in the univariate setting have been proposed, with one popular approach involving minimising a penalised cost function using dynamic programming. In some situations, these…

Methodology · Statistics 2018-10-09 S. O. Tickle , I. A. Eckley , P. Fearnhead , K. Haynes

An Efficient and Flexible Engine for Computing Fixed Points

An efficient and flexible engine for computing fixed points is critical for many practical applications. In this paper, we firstly present a goal-directed fixed point computation strategy in the logic programming paradigm. The strategy…

Programming Languages · Computer Science 2007-05-23 Hai-Feng Guo , Gopal Gupta

An Inertial Parallel and Asynchronous Fixed-Point Iteration for Convex Optimization

Two characteristics that make convex decomposition algorithms attractive are simplicity of operations and generation of parallelizable structures. In principle, these schemes require that all coordinates update at the same time, i.e., they…

Optimization and Control · Mathematics 2018-03-07 Giorgos Stathopoulos , Colin N. Jones

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

Dynamic parallelism on GPUs allows GPU threads to dynamically launch other GPU threads. It is useful in applications with nested parallelism, particularly where the amount of nested parallelism is irregular and cannot be predicted…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-11 Mhd Ghaith Olabi , Juan Gómez Luna , Onur Mutlu , Wen-mei Hwu , Izzat El Hajj

Parallel Iterated Extended and Sigma-point Kalman Smoothers

The problem of Bayesian filtering and smoothing in nonlinear models with additive noise is an active area of research. Classical Taylor series as well as more recent sigma-point based methods are two well-known strategies to deal with these…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-02 Fatemeh Yaghoobi , Adrien Corenflos , Sakira Hassan , Simo Särkkä

Evaluating Massively Parallel Algorithms for DFA Minimisation, Equivalence Checking and Inclusion Checking

We study parallel algorithms for the minimisation and equivalence checking of Deterministic Finite Automata (DFAs). Regarding DFA minimisation, we implement four different massively parallel algorithms on Graphics Processing Units~(GPUs).…

Formal Languages and Automata Theory · Computer Science 2025-08-29 Jan Heemstra , Jan Martens , Anton Wijs

Interactive Abstract Interpretation: Reanalyzing Whole Programs for Cheap

To put static program analysis at the fingertips of the software developer, we propose a framework for interactive abstract interpretation. While providing sound analysis results, abstract interpretation in general can be quite costly. To…

Programming Languages · Computer Science 2022-11-28 Julian Erhard , Simmo Saan , Sarah Tilscher , Michael Schwarz , Karoliine Holter , Vesal Vojdani , Helmut Seidl

A Multilevel Approach for the Performance Analysis of Parallel Algorithms

We provide a multilevel approach for analysing performances of parallel algorithms. The main outcome of such approach is that the algorithm is described by using a set of operators which are related to each other according to the problem…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-18 Luisa D'Amore , Valeria Mele , Diego Romano , Giuliano Laccetti

Efficient Parallel Computation of the Estimated Covariance Matrix

Computation of a signal's estimated covariance matrix is an important building block in signal processing, e.g., for spectral estimation. Each matrix element is a sum of products of elements in the input matrix taken over a sliding window.…

Data Structures and Algorithms · Computer Science 2013-03-12 Oded Green , Lior David , Ami Galperin , Yitzhak Birk

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…

Programming Languages · Computer Science 2016-04-13 Alcides Fonseca , Bruno Cabral , João Rafael , Ivo Correia

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari