Related papers: Towards Programmable Memory Controller for Tensor …

Reconfigurable Low-latency Memory System for Sparse Matricized Tensor Times Khatri-Rao Product on FPGA

Tensor decomposition has become an essential tool in many applications in various domains, including machine learning. Sparse Matricized Tensor Times Khatri-Rao Product (MTTKRP) is one of the most computationally expensive kernels in tensor…

Hardware Architecture · Computer Science 2021-09-21 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In tensor decomposition, spMTTKRP is performed iteratively along all the modes of an input tensor. In this work, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-25 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Sparse MTTKRP Acceleration for Tensor Decomposition on GPU

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In this work, we propose a GPU-based algorithm design to address the key challenges in accelerating spMTTKRP computation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Dynasor: A Dynamic Memory Layout for Accelerating Sparse MTTKRP for Tensor Decomposition on Multi-core CPU

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the most time-consuming compute kernel in sparse tensor decomposition. In this paper, we introduce a novel algorithm to minimize the execution time of spMTTKRP across all modes…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-17 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Load-Balanced Sparse MTTKRP on GPUs

Sparse matricized tensor times Khatri-Rao product (MTTKRP) is one of the most computationally expensive kernels in sparse tensor computations. This work focuses on optimizing the MTTKRP operation on GPUs, addressing both performance and…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-09 Israt Nisa , Jiajia Li , Aravind Sukumaran-Rajam , Richard Vuduc , P. Sadayappan

Shared Memory Parallelization of MTTKRP for Dense Tensors

The matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bottleneck for algorithms computing CP decompositions of tensors. In this paper, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-31 Koby Hayashi , Grey Ballard , Jeffrey Jiang , Michael Tobia

A Performance Portable Matrix Free Dense MTTKRP in GenTen

We extend the GenTen tensor decomposition package by introducing an accelerated dense matricized tensor times Khatri-Rao product (MTTKRP), the workhorse kernel for canonical polyadic (CP) tensor decompositions, that is portable and…

Mathematical Software · Computer Science 2025-10-17 Gabriel Kosmacher , Eric T. Phipps , Sivasankaran Rajamanickam

AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs

Matricized Tensor Times Khatri-Rao Product (MTTKRP) is the computational bottleneck in sparse tensor decomposition. As real-world sparse tensors grow to billions of nonzeros, they increasingly demand higher memory capacity and compute…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-12 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Software for Sparse Tensor Decomposition on Emerging Computing Architectures

In this paper, we develop software for decomposing sparse tensors that is portable to and performant on a variety of multicore, manycore, and GPU computing architectures. The result is a single code whose performance matches optimized…

Mathematical Software · Computer Science 2019-07-30 Eric Phipps , Tamara G. Kolda

Parallel Nonnegative CP Decomposition of Dense Tensors

The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensor data that can…

Numerical Analysis · Computer Science 2018-06-22 Grey Ballard , Koby Hayashi , Ramakrishnan Kannan

Sparse Tucker Tensor Decomposition on a Hybrid FPGA-CPU Platform

Recommendation systems, social network analysis, medical imaging, and data mining often involve processing sparse high-dimensional data. Such high-dimensional data are naturally represented as tensors, and they cannot be efficiently…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-22 Weiyun Jiang , Kaiqi Zhang , Colin Yu Lin , Feng Xing , Zheng Zhang

PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration

Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while reducing their dimension, have become prevalent procedures in machine learning.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Daniel Pacheco , Leonel Sousa , Aleksandar Ilic

Performance Modeling Sparse MTTKRP Using Optical Static Random Access Memory on FPGA

Electrical static random memory (E-SRAM) is the current standard for internal static memory in Field Programmable Gate Array (FPGA). Despite the dramatic improvement in E-SRAM technology over the past decade, the goal of ultra-fast,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-24 Sasindu Wijeratne , Akhilesh Jaiswal , Ajey P. Jacob , Bingyi Zhang , Viktor Prasanna

Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures

Tensor decomposition (TD) is an important method for extracting latent information from high-dimensional (multi-modal) sparse data. This study presents a novel framework for accelerating fundamental TD operations on massively parallel GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-29 Andy Nguyen , Ahmed E. Helal , Fabio Checconi , Jan Laukemann , Jesmin Jahan Tithi , Yongseok Soh , Teresa Ranadive , Fabrizio Petrini , Jee W. Choi

Tucker Tensor Decomposition on FPGA

Tensor computation has emerged as a powerful mathematical tool for solving high-dimensional and/or extreme-scale problems in science and engineering. The last decade has witnessed tremendous advancement of tensor computation and its…

Signal Processing · Electrical Eng. & Systems 2019-07-05 Kaiqi Zhang , Xiyuan Zhang , Zheng Zhang

Improved Analysis of Khatri-Rao Random Projections and Applications

Randomization has emerged as a powerful set of tools for large-scale matrix and tensor decompositions. Randomized algorithms involve computing sketches with random matrices. A prevalent approach is to take the random matrix as a standard…

Numerical Analysis · Mathematics 2026-04-02 Arvind K. Saibaba , Bhisham Dev Verma , Grey Ballard

Stochastic Gradients for Large-Scale Tensor Decomposition

Tensor decomposition is a well-known tool for multiway data analysis. This work proposes using stochastic gradients for efficient generalized canonical polyadic (GCP) tensor decomposition of large-scale tensors. GCP tensor decomposition is…

Numerical Analysis · Mathematics 2020-11-25 Tamara G. Kolda , David Hong

Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product

The matricized-tensor times Khatri-Rao product computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-24 Grey Ballard , Nicholas Knight , Kathryn Rouse

Programmable FPGA-based Memory Controller

Even with generational improvements in DRAM technology, memory access latency still remains the major bottleneck for application accelerators, primarily due to limitations in memory interface IPs which cannot fully account for variations in…

Hardware Architecture · Computer Science 2021-08-24 Sasindu Wijeratne , Sanket Pattnaik , Zhiyu Chen , Rajgopal Kannan , Viktor Prasanna

Adaptive Randomized Tensor Train Rounding using Khatri-Rao Products

Approximating a tensor in the tensor train (TT) format has many important applications in scientific computing. Rounding a TT tensor involves further compressing a tensor that is already in the TT format. This paper proposes new randomized…

Numerical Analysis · Mathematics 2025-11-06 Hussam Al Daas , Grey Ballard , Laura Grigori , Mariana Martinez Aguilar , Arvind K. Saibaba , Bhisham Dev Verma