Related papers: Solving Dynamic Programming Problem by Pipeline Im…
A common method to define a parallel solution for a computational problem consists in finding a way to use the Divide and Conquer paradigm in order to have processors acting on its own data and scheduled in a parallel fashion. MapReduce is…
The generalized method to have a parallel solution to a computational problem, is to find a way to use Divide & Conquer paradigm in order to have processors acting on its own data and therefore all can be scheduled in parallel. MapReduce is…
Dynamic programming is a powerful technique that is, unfortunately, often inherently sequential. That is, there exists no unified method to parallelize algorithms that use dynamic programming. In this paper, we attempt to address this issue…
GPUs have been widely used to accelerate computations exhibiting simple patterns of parallelism - such as flat or two-level parallelism - and a degree of parallelism that can be statically determined based on the size of the input dataset.…
Realistic reservoir simulation is known to be prohibitively expensive in terms of computation time when increasing the accuracy of the simulation or by enlarging the model grid size. One method to address this issue is to parallelize the…
Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…
It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However,…
Linear Programming (LP) is a foundational optimization technique with widespread applications in finance, energy trading, and supply chain logistics. However, traditional Central Processing Unit (CPU)-based LP solvers often struggle to meet…
Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-processor systems, due to…
This paper presents a novel, high-performance, graphical processing unit-based algorithm for efficiently solving two-dimensional linear programs in batches. The domain of two-dimensional linear programs is particularly useful due to the…
Dynamic programming (DP) is a cornerstone of combinatorial optimization, yet its inherently sequential structure has long limited its scalability in scenario-based stochastic programming (SP). This paper introduces a GPU-accelerated…
The alternating direction method of multipliers (ADMM) is a powerful operator splitting technique for solving structured convex optimization problems. Due to its relatively low per-iteration computational cost and ability to exploit…
Linear Programs (LPs) appear in a large number of applications and offloading them to the GPU is viable to gain performance. Existing work on offloading and solving an LP on GPU suggests that performance is gained from large sized LPs…
In this paper we solve on GPUs massive problems with large amount of data, which are not appropriate for solution with the SIMD technology. For the given problem we consider a three-level parallelization. The multithreading of CPU is used…
We present a batched first-order method for solving multiple linear programs in parallel on GPUs. Our approach extends the primal-dual hybrid gradient algorithm to efficiently solve batches of related linear programming problems that arise…
Linear Programs (LPs) appear in a large number of applications and offloading them to a GPU is viable to gain performance. Existing work on offloading and solving an LP on a GPU suggests that there is performance gain generally on large…
Molecular dynamics (MD) is an important research tool extensively applied in materials science. Running MD on a graphics processing unit (GPU) is an attractive new approach for accelerating MD simulations. Currently, GPU implementations of…
The goal of this work is to parallelize the multistep scheme for the numerical approximation of the backward stochastic differential equations (BSDEs) in order to achieve both, a high accuracy and a reduction of the computation time as…
Pipeline parallelism is a crucial paradigm for large-scale model training. However, imbalances in memory footprint across stages can lead to significant GPU memory wastage, limiting the model sizes that pipeline parallelism can effectively…
This paper presents a Graphics Processing Units (GPUs) acceleration method of an iterative scheme for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work features a fast converging…