Related papers: Multi-GPU implementation of a VMAT treatment plan …

Ultra-fast treatment plan optimization for volumetric modulated arc therapy (VMAT)

Purpose: To develop a novel aperture-based algorithm for volumetric modulated arc therapy (VMAT) treatment plan optimization with high quality and high efficiency. Methods: The VMAT optimization problem is formulated as a large-scale convex…

Medical Physics · Physics 2015-05-19 Chunhua Men , H. Edwin Romeijn , Xun Jia , Steve B. Jiang

The fixed-point iteration method for IMRT optimization with truncated dose deposition coefficient matrix

In the treatment plan optimization for intensity modulated radiation therapy (IMRT), dose-deposition coefficient (DDC) matrix is often pre-computed to parameterize the dose contribution to each voxel in the volume of interest from each…

Medical Physics · Physics 2013-03-15 Zhen Tian , Masoud Zarepisheh , Xun Jia , Steve B. Jiang

Near-Precise Parameter Approximation for Multiple Multiplications on A Single DSP Block

A multiply-accumulate (MAC) operation is the main computation unit for DSP applications. DSP blocks are one of the efficient solutions to implement MACs in FPGA's. However, since the DSP blocks have wide multiplier and adder blocks, MAC…

Hardware Architecture · Computer Science 2021-10-26 Ercan Kalali , Rene van Leuken

A Highly Scalable TDMA for GPUs and Its Application to Flow Solver Optimization

A tridiagonal matrix algorithm (TDMA), Pipelined-TDMA, is developed for multi-GPU systems to resolve the scalability bottlenecks caused by the sequential structure of conventional divide-and-conquer TDMA. The proposed method pipelines…

Computational Physics · Physics 2025-09-05 Seungchan Kim , Jihoo Kim , Sanghyun Ha , Donghyun You

HPRMAT: A high-performance R-matrix solver with GPU acceleration for coupled-channel problems in nuclear physics

I present HPRMAT, a high-performance solver library for the linear systems arising in R-matrix coupled-channel scattering calculations in nuclear physics. Designed as a drop-in replacement for the linear algebra routines in existing…

Computational Physics · Physics 2025-12-15 Jin Lei

Efficient Matrix Factorization on Heterogeneous CPU-GPU Systems

Matrix Factorization (MF) has been widely applied in machine learning and data mining. A large number of algorithms have been studied to factorize matrices. Among them, stochastic gradient descent (SGD) is a commonly used method.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-30 Yuanhang Yu , Dong Wen , Ying Zhang , Xiaoyang Wang , Wenjie Zhang , Xuemin Lin

Efficient GPU implementation of randomized SVD and its applications

Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which…

Machine Learning · Computer Science 2024-03-13 Łukasz Struski , Paweł Morkisz , Przemysław Spurek , Samuel Rodriguez Bernabeu , Tomasz Trzciński

GPU accelerated matrix factorization of large scale data using block based approach

Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs…

Machine Learning · Computer Science 2023-04-28 Prasad Bhavana , Vineet Padmanabhan

Dynamic Modulated Arc Therapy (DMAT): An Intent-Driven, Time-Aware Framework for Next-Generation Radiotherapy Delivery

Traditional VMAT optimization often ignores dynamic machine limits, treating delivery time as an emergent property rather than a steerable parameter. This work introduces Dynamic Modulated Arc Therapy (DMAT), an intent-driven framework that…

Medical Physics · Physics 2026-05-14 Taoran Li , Esa Kuusela , Emmi Ruokokoski , Heini Hyvönen , Jerry Jaboin , Mirko Myllykoski , Jussi Nurminen , Riku Paananen , Jarkko Peltola , Marko Rusanen , Martin Sabel , Kevin Moore , Christopher Boylan

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

Memory-constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU Platforms

The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task and data-level parallelism to meet throughput requirements…

Signal Processing · Electrical Eng. & Systems 2017-12-01 Shuoxin Lin , Jiahao Wu , Shuvra S. Bhattacharyya

Multicriteria VMAT optimization

Purpose: To make the planning of volumetric modulated arc therapy (VMAT) faster and to explore the tradeoffs between planning objectives and delivery efficiency. Methods: A convex multicriteria dose optimization problem is solved for an…

Medical Physics · Physics 2015-05-28 David Craft , Dualta McQuaid , Jeremiah Wala , Wei Chen , Ehsan Salari , Thomas Bortfeld

MMStencil: Optimizing High-order Stencils on Multicore CPU using Matrix Unit

Matrix-accelerated stencil computation is a hot research topic, yet its application to three-dimensional (3D) high-order stencils and HPC remains underexplored. With the emergence of matrix units on multicore CPUs, we analyze matrix-based…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-16 Yinuo Wang , Tianqi Mao , Lin Gan , Wubing Wan , Zeyu Song , Jiayu Fu , Lanke He , Wenqiang Wang , Zekun Yin , Wei Xue , Guangwen Yang

From GPUs to RRAMs: Distributed In-Memory Primal-Dual Hybrid Gradient Method for Solving Large-Scale Linear Optimization Problem

The exponential growth of computational workloads is surpassing the capabilities of conventional architectures, which are constrained by fundamental limits. In-memory computing (IMC) with RRAM provides a promising alternative by providing…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-09 Huynh Q. N. Vo , Md Tawsif Rahman Chowdhury , Paritosh Ramanan , Gozde Tutuncuoglu , Junchi Yang , Feng Qiu , Murat Yildirim

Randomized Algorithms For High Quality Treatment Planning in Volumetric Modulated Arc Therapy

In recent years, volumetric modulated arc therapy (VMAT) has been becoming a more and more important radiation technique widely used in clinical application for cancer treatment. One of the key problems in VMAT is treatment plan…

Medical Physics · Physics 2016-01-06 Yu Yang , Bin Dong , Zaiwen Wen

D-PDLP: Scaling PDLP to Distributed Multi-GPU Systems

We present a distributed framework of the Primal-Dual Hybrid Gradient (PDHG) algorithm for solving massive-scale linear programming (LP) problems. Although PDHG-based solvers demonstrate strong performance on single-node GPU architectures,…

Optimization and Control · Mathematics 2026-05-11 Hongpei Li , Yicheng Huang , Huikang Liu , Dongdong Ge , Yinyu Ye

Exploiting GPU/SIMD Architectures for Solving Linear-Quadratic MPC Problems

We report numerical results on solving constrained linear-quadratic model predictive control (MPC) problems by exploiting graphics processing units (GPUs). The presented method reduces the MPC problem by eliminating the state variables and…

Optimization and Control · Mathematics 2026-05-11 David Cole , Sungho Shin , François Pacaud , Victor M. Zavala , Mihai Anitescu

Solving Dynamic Programming Problem by Pipeline Implementation on GPU

In this paper, we show the effectiveness of a pipeline implementation of Dynamic Programming (DP) on GPU. As an example, we explain how to solve a matrix-chain multiplication (MCM) problem by DP on GPU. This problem can be sequentially…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-06 Susumu Matsumae , Makoto Miyazaki

Graphics Processing Units and High-Dimensional Optimization

This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many…

Computation · Statistics 2015-03-13 Hua Zhou , Kenneth Lange , Marc A. Suchard

GPU-based fast Monte Carlo simulation for radiotherapy dose calculation

Monte Carlo (MC) simulation is commonly considered to be the most accurate dose calculation method in radiotherapy. However, its efficiency still requires improvement for many routine clinical applications. In this paper, we present our…

Medical Physics · Physics 2015-05-28 Xun Jia , Xuejun Gu , Yan Jiang Graves , Michael Folkerts , Steve B. Jiang