In this paper, we show the effectiveness of a pipeline implementation of Dynamic Programming (DP) on GPU. As an example, we explain how to solve a matrix-chain multiplication (MCM) problem by DP on GPU. This problem can be sequentially solved in O(n3) steps by DP where n is the number of matrices, because its solution table is of size n×n and each element of the table can be computed in O(n) steps. A typical speedup strategy for this is to parallelize the O(n) step computation of each element, which can be easily achieved by parallel prefix computation, i.e., an O(logn) step computation with n threads in a tournament fashion. By such a standard parallelizing method, we can solve the MCM problem in O(n2logn) steps with n threads. In our approach, we solve the MCM problem on GPU in a pipeline fashion, i.e., we use GPU cores for supporting pipeline-stages so that many elements of the solution table are partially computed in parallel at one time. Our implementation determines one output value per one computational step with n threads in a pipeline fashion and constructs the solution table totally in O(n2) steps with n threads.
@article{arxiv.2008.01938,
title = {Solving Dynamic Programming Problem by Pipeline Implementation on GPU},
author = {Susumu Matsumae and Makoto Miyazaki},
journal= {arXiv preprint arXiv:2008.01938},
year = {2020}
}