Related papers: Multi-GPU implementation of a VMAT treatment plan …
Purpose: To develop a novel aperture-based algorithm for volumetric modulated arc therapy (VMAT) treatment plan optimization with high quality and high efficiency. Methods: The VMAT optimization problem is formulated as a large-scale convex…
In the treatment plan optimization for intensity modulated radiation therapy (IMRT), dose-deposition coefficient (DDC) matrix is often pre-computed to parameterize the dose contribution to each voxel in the volume of interest from each…
A multiply-accumulate (MAC) operation is the main computation unit for DSP applications. DSP blocks are one of the efficient solutions to implement MACs in FPGA's. However, since the DSP blocks have wide multiplier and adder blocks, MAC…
A tridiagonal matrix algorithm (TDMA), Pipelined-TDMA, is developed for multi-GPU systems to resolve the scalability bottlenecks caused by the sequential structure of conventional divide-and-conquer TDMA. The proposed method pipelines…
I present HPRMAT, a high-performance solver library for the linear systems arising in R-matrix coupled-channel scattering calculations in nuclear physics. Designed as a drop-in replacement for the linear algebra routines in existing…
Matrix Factorization (MF) has been widely applied in machine learning and data mining. A large number of algorithms have been studied to factorize matrices. Among them, stochastic gradient descent (SGD) is a commonly used method.…
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which…
Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs…
Traditional VMAT optimization often ignores dynamic machine limits, treating delivery time as an emergent property rather than a steerable parameter. This work introduces Dynamic Modulated Arc Therapy (DMAT), an intent-driven framework that…
Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…
The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task and data-level parallelism to meet throughput requirements…
Purpose: To make the planning of volumetric modulated arc therapy (VMAT) faster and to explore the tradeoffs between planning objectives and delivery efficiency. Methods: A convex multicriteria dose optimization problem is solved for an…
Matrix-accelerated stencil computation is a hot research topic, yet its application to three-dimensional (3D) high-order stencils and HPC remains underexplored. With the emergence of matrix units on multicore CPUs, we analyze matrix-based…
The exponential growth of computational workloads is surpassing the capabilities of conventional architectures, which are constrained by fundamental limits. In-memory computing (IMC) with RRAM provides a promising alternative by providing…
In recent years, volumetric modulated arc therapy (VMAT) has been becoming a more and more important radiation technique widely used in clinical application for cancer treatment. One of the key problems in VMAT is treatment plan…
We present a distributed framework of the Primal-Dual Hybrid Gradient (PDHG) algorithm for solving massive-scale linear programming (LP) problems. Although PDHG-based solvers demonstrate strong performance on single-node GPU architectures,…
We report numerical results on solving constrained linear-quadratic model predictive control (MPC) problems by exploiting graphics processing units (GPUs). The presented method reduces the MPC problem by eliminating the state variables and…
In this paper, we show the effectiveness of a pipeline implementation of Dynamic Programming (DP) on GPU. As an example, we explain how to solve a matrix-chain multiplication (MCM) problem by DP on GPU. This problem can be sequentially…
This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many…
Monte Carlo (MC) simulation is commonly considered to be the most accurate dose calculation method in radiotherapy. However, its efficiency still requires improvement for many routine clinical applications. In this paper, we present our…