Related papers: Software for Sparse Tensor Decomposition on Emergi…

Dynasor: A Dynamic Memory Layout for Accelerating Sparse MTTKRP for Tensor Decomposition on Multi-core CPU

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the most time-consuming compute kernel in sparse tensor decomposition. In this paper, we introduce a novel algorithm to minimize the execution time of spMTTKRP across all modes…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-17 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Sparse MTTKRP Acceleration for Tensor Decomposition on GPU

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In this work, we propose a GPU-based algorithm design to address the key challenges in accelerating spMTTKRP computation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-15 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In tensor decomposition, spMTTKRP is performed iteratively along all the modes of an input tensor. In this work, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-25 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Towards Programmable Memory Controller for Tensor Decomposition

Tensor decomposition has become an essential tool in many data science applications. Sparse Matricized Tensor Times Khatri-Rao Product (MTTKRP) is the pivotal kernel in tensor decomposition algorithms that decompose higher-order real-world…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-19 Sasindu Wijeratne , Ta-Yang Wang , Rajgopal Kannan , Viktor Prasanna

Load-Balanced Sparse MTTKRP on GPUs

Sparse matricized tensor times Khatri-Rao product (MTTKRP) is one of the most computationally expensive kernels in sparse tensor computations. This work focuses on optimizing the MTTKRP operation on GPUs, addressing both performance and…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-09 Israt Nisa , Jiajia Li , Aravind Sukumaran-Rajam , Richard Vuduc , P. Sadayappan

Reconfigurable Low-latency Memory System for Sparse Matricized Tensor Times Khatri-Rao Product on FPGA

Tensor decomposition has become an essential tool in many applications in various domains, including machine learning. Sparse Matricized Tensor Times Khatri-Rao Product (MTTKRP) is one of the most computationally expensive kernels in tensor…

Hardware Architecture · Computer Science 2021-09-21 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Analyzing the Performance Portability of Tensor Decomposition

We employ pressure point analysis and roofline modeling to identify performance bottlenecks and determine an upper bound on the performance of the Canonical Polyadic Alternating Poisson Regression Multiplicative Update (CP-APR MU) algorithm…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-10 S. Isaac Geronimo Anderson , Keita Teranishi , Daniel M. Dunlavy , Jee Choi

Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures

Tensor decomposition (TD) is an important method for extracting latent information from high-dimensional (multi-modal) sparse data. This study presents a novel framework for accelerating fundamental TD operations on massively parallel GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-29 Andy Nguyen , Ahmed E. Helal , Fabio Checconi , Jan Laukemann , Jesmin Jahan Tithi , Yongseok Soh , Teresa Ranadive , Fabrizio Petrini , Jee W. Choi

AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs

Matricized Tensor Times Khatri-Rao Product (MTTKRP) is the computational bottleneck in sparse tensor decomposition. As real-world sparse tensors grow to billions of nonzeros, they increasingly demand higher memory capacity and compute…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-12 Sasindu Wijeratne , Rajgopal Kannan , Viktor Prasanna

Parallel Nonnegative CP Decomposition of Dense Tensors

The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensor data that can…

Numerical Analysis · Computer Science 2018-06-22 Grey Ballard , Koby Hayashi , Ramakrishnan Kannan

A Performance Portable Matrix Free Dense MTTKRP in GenTen

We extend the GenTen tensor decomposition package by introducing an accelerated dense matricized tensor times Khatri-Rao product (MTTKRP), the workhorse kernel for canonical polyadic (CP) tensor decompositions, that is portable and…

Mathematical Software · Computer Science 2025-10-17 Gabriel Kosmacher , Eric T. Phipps , Sivasankaran Rajamanickam

Shared Memory Parallelization of MTTKRP for Dense Tensors

The matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bottleneck for algorithms computing CP decompositions of tensors. In this paper, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-31 Koby Hayashi , Grey Ballard , Jeffrey Jiang , Michael Tobia

PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration

Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while reducing their dimension, have become prevalent procedures in machine learning.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Daniel Pacheco , Leonel Sousa , Aleksandar Ilic

Sparse Tucker Tensor Decomposition on a Hybrid FPGA-CPU Platform

Recommendation systems, social network analysis, medical imaging, and data mining often involve processing sparse high-dimensional data. Such high-dimensional data are naturally represented as tensors, and they cannot be efficiently…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-22 Weiyun Jiang , Kaiqi Zhang , Colin Yu Lin , Feng Xing , Zheng Zhang

Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels

As hardware architectures are evolving in the push towards exascale, developing Computational Science and Engineering (CSE) applications depend on performance portable approaches for sustainable software development. This paper describes…

Mathematical Software · Computer Science 2021-03-23 Sivasankaran Rajamanickam , Seher Acer , Luc Berger-Vergiat , Vinh Dang , Nathan Ellingwood , Evan Harvey , Brian Kelley , Christian R. Trott , Jeremiah Wilke , Ichitaro Yamazaki

Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-Iteration

This paper presents a code generator for sparse tensor contraction computations. It leverages a mathematical representation of loop nest computations in the sparse polyhedral framework (SPF), which extends the polyhedral model to support…

Programming Languages · Computer Science 2022-08-26 Tuowen Zhao , Tobi Popoola , Mary Hall , Catherine Olschanowsky , Michelle Mills Strout

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

Sparse tensor algebra is challenging to efficiently parallelize due to the irregular, data-dependent, and potentially skewed structure of sparse computation. We propose the first partitioning algorithm that provably load balances the…

Programming Languages · Computer Science 2026-04-23 Atharva Chougule , Alexander J Root , Rubens Lacouture , Bobby Yan , Rohan Yadav , Fredrik Kjolstad

Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor Network

Sparse tensor decomposition and completion are common in numerous applications, ranging from machine learning to computational quantum chemistry. Typically, the main bottleneck in optimization of these models are contractions of a single…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-17 Raghavendra Kanakagiri , Edgar Solomonik

A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs

Tensor computations present significant performance challenges that impact a wide spectrum of applications ranging from machine learning, healthcare analytics, social network analysis, data mining to quantum chemistry and signal processing.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-06 Jiajia Li , Mahesh Lakshminarasimhan , Xiaolong Wu , Ang Li , Catherine Olschanowsky , Kevin Barker

Rapid Exploration of Optimization Strategies on Advanced Architectures using TestSNAP and LAMMPS

The exascale race is at an end with the announcement of the Aurora and Frontier machines. This next generation of supercomputers utilize diverse hardware architectures to achieve their compute performance, providing an added onus on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-26 Rahulkumar Gayatri , Stan Moore , Evan Weinberg , Nicholas Lubbers , Sarah Anderson , Jack Deslippe , Danny Perez , Aidan P. Thompson