Related papers: Solving Dynamic Programming Problem by Pipeline Im…

Comparing MapReduce and Pipeline Implementations for Counting Triangles

A common method to define a parallel solution for a computational problem consists in finding a way to use the Divide and Conquer paradigm in order to have processors acting on its own data and scheduled in a parallel fashion. MapReduce is…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-13 Edelmira Pasarella , Maria-Esther Vidal , Cristina Zoltan

Parallel Triangles Counting Using Pipelining

The generalized method to have a parallel solution to a computational problem, is to find a way to use Divide & Conquer paradigm in order to have processors acting on its own data and therefore all can be scheduled in parallel. MapReduce is…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-13 Julián Aráoz , Cristina Zoltan

Massively Parallel Dynamic Programming on Trees

Dynamic programming is a powerful technique that is, unfortunately, often inherently sequential. That is, there exists no unified method to parallelize algorithms that use dynamic programming. In this paper, we attempt to address this issue…

Data Structures and Algorithms · Computer Science 2018-09-18 MohammadHossein Bateni , Soheil Behnezhad , Mahsa Derakhshan , MohammadTaghi Hajiaghayi , Vahab Mirrokni

Compiler-Assisted Workload Consolidation For Efficient Dynamic Parallelism on GPU

GPUs have been widely used to accelerate computations exhibiting simple patterns of parallelism - such as flat or two-level parallelism - and a degree of parallelism that can be statically determined based on the size of the input dataset.…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-18 Hancheng Wu , Da Li , Michela Becchi

An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator

Realistic reservoir simulation is known to be prohibitively expensive in terms of computation time when increasing the accuracy of the simulation or by enlarging the model grid size. One method to address this issue is to parallelize the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-14 Tong Dong Qiu , Andreas Thune , Vinicius Oliveira Martins , Markus Blatt , Alf Birger Rustad , Razvan Nane

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-03 Shiqing Fan , Yi Rong , Chen Meng , Zongyan Cao , Siyu Wang , Zhen Zheng , Chuan Wu , Guoping Long , Jun Yang , Lixue Xia , Lansong Diao , Xiaoyong Liu , Wei Lin

Accelerating a Linear Programming Algorithm on AMD GPUs

Linear Programming (LP) is a foundational optimization technique with widespread applications in finance, energy trading, and supply chain logistics. However, traditional Central Processing Unit (CPU)-based LP solvers often struggle to meet…

Optimization and Control · Mathematics 2025-08-26 Xiyan Hu , Titus Parker , Connor Phillips , Yifa Yu

Parallelizing Optimal Multiple Sequence Alignment by Dynamic Programming

Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-processor systems, due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-30 Manal Helal , Hossam El-Gindy , Lenore Mullin , Bruno Gaeta

Two-Dimensional Batch Linear Programming on the GPU

This paper presents a novel, high-performance, graphical processing unit-based algorithm for efficiently solving two-dimensional linear programs in batches. The domain of two-dimensional linear programs is particularly useful due to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-14 John Charlton , Steve Maddock , Paul Richmond

GPU-based Split algorithm for Large-Scale CVRPSD

Dynamic programming (DP) is a cornerstone of combinatorial optimization, yet its inherently sequential structure has long limited its scalability in scenario-based stochastic programming (SP). This paper introduces a GPU-accelerated…

Optimization and Control · Mathematics 2025-11-25 Jingyi Zhao , Linxin Yang , Haohua Zhang , Tian Ding

GPU Acceleration of ADMM for Large-Scale Quadratic Programming

The alternating direction method of multipliers (ADMM) is a powerful operator splitting technique for solving structured convex optimization problems. Due to its relatively low per-iteration computational cost and ability to exploit…

Optimization and Control · Mathematics 2020-06-09 Michel Schubiger , Goran Banjac , John Lygeros

Solving Batched Linear Programs on GPU and Multicore CPU

Linear Programs (LPs) appear in a large number of applications and offloading them to the GPU is viable to gain performance. Existing work on offloading and solving an LP on GPU suggests that performance is gained from large sized LPs…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-09-27 Amit Gurung , Rajarshi Ray

Parallel algorithms for problems of cluster analysis with very large amount of data

In this paper we solve on GPUs massive problems with large amount of data, which are not appropriate for solution with the SIMD technology. For the given problem we consider a three-level parallelization. The multithreading of CPU is used…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-02-18 Natalya Litvinenko

Batched First-Order Methods for Parallel LP Solving in MIP

We present a batched first-order method for solving multiple linear programs in parallel on GPUs. Our approach extends the primal-dual hybrid gradient algorithm to efficiently solve batches of related linear programming problems that arise…

Optimization and Control · Mathematics 2026-01-30 Nicolas Blin , Stefano Gualandi , Christopher Maes , Andrea Lodi , Bartolomeo Stellato

Simultaneous Solving of Batched Linear Programs on a GPU

Linear Programs (LPs) appear in a large number of applications and offloading them to a GPU is viable to gain performance. Existing work on offloading and solving an LP on a GPU suggests that there is performance gain generally on large…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-26 Amit Gurung , Rajarshi Ray

Molecular dynamics simulations with many-body potentials on multiple GPUs - the implementation, package and performance

Molecular dynamics (MD) is an important research tool extensively applied in materials science. Running MD on a graphics processing unit (GPU) is an attractive new approach for accelerating MD simulations. Currently, GPU implementations of…

Computational Physics · Physics 2015-06-12 Qing Hou , Min Li , Yulu Zhou , Jiechao Cui , Zhenguo Cui , Jun Wang

Multistep schemes for solving backward stochastic differential equations on GPU

The goal of this work is to parallelize the multistep scheme for the numerical approximation of the backward stochastic differential equations (BSDEs) in order to achieve both, a high accuracy and a reduction of the computation time as…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-18 Lorenc Kapllani , Long Teng

DawnPiper: A Memory-scablable Pipeline Parallel Training Framework

Pipeline parallelism is a crucial paradigm for large-scale model training. However, imbalances in memory footprint across stages can lead to significant GPU memory wastage, limiting the model sizes that pipeline parallelism can effectively…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-12 Xuan Peng , Xuanhua Shi , Haolin Zhang , Yunfei Zhao , Xuehai Qian

GPU acceleration of an iterative scheme for gas-kinetic model equations with memory reduction techniques

This paper presents a Graphics Processing Units (GPUs) acceleration method of an iterative scheme for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work features a fast converging…

Computational Physics · Physics 2020-01-08 Lianhua Zhu , Peng Wang , Songze Chen , Zhaoli Guo , Yonghao Zhang