Related papers: Strassen's Algorithm for Tensor Contraction

Design of a high-performance GEMM-like Tensor-Tensor Multiplication

We present "GEMM-like Tensor-Tensor multiplication" (GETT), a novel approach to tensor contractions that mirrors the design of a high-performance general matrix-matrix multiplication (GEMM). The critical insight behind GETT is the…

Mathematical Software · Computer Science 2017-11-08 Paul Springer , Paolo Bientinesi

Generating Families of Practical Fast Matrix Multiplication Algorithms

Matrix multiplication (GEMM) is a core operation to numerous scientific applications. Traditional implementations of Strassen-like fast matrix multiplication (FMM) algorithms often do not perform well except for very large matrix sizes, due…

Mathematical Software · Computer Science 2016-11-04 Jianyu Huang , Leslie Rice , Devin A. Matthews , Robert A. van de Geijn

High-Performance Tensor Contraction without Transposition

Tensor computations--in particular tensor contraction (TC)--are important kernels in many scientific computing applications. Due to the fundamental similarity of TC to matrix multiplication (MM) and to the availability of optimized…

Mathematical Software · Computer Science 2025-03-26 Devin A. Matthews

Performance of linear solvers in tensor-train format on current multicore architectures

Tensor networks are a class of algorithms aimed at reducing the computational complexity of high-dimensional problems. They are used in an increasing number of applications, from quantum simulations to machine learning. Exploiting data…

Numerical Analysis · Mathematics 2024-10-25 Melven Röhrig-Zöllner , Manuel Joey Becklas , Jonas Thies , Achim Basermann

Fast and Practical Strassen's Matrix Multiplication using FPGAs

Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of $\mathcal{O}(n^3)$ for $n\times n$…

Hardware Architecture · Computer Science 2024-06-05 Afzal Ahmad , Linfeng Du , Wei Zhang

Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors

Symmetric tensor operations arise in a wide variety of computations. However, the benefits of exploiting symmetry in order to reduce storage and computation is in conflict with a desire to simplify memory access patterns. In this paper, we…

Numerical Analysis · Mathematics 2014-10-21 Martin D. Schatz , Tze Meng Low , Robert A. van de Geijn , Tamara G. Kolda

Implementing Strassen's Algorithm with CUTLASS on NVIDIA Volta GPUs

Conventional GPU implementations of Strassen's algorithm (Strassen) typically rely on the existing high-performance matrix multiplication (GEMM), trading space for time. As a result, such approaches can only achieve practical speedup for…

Mathematical Software · Computer Science 2018-08-27 Jianyu Huang , Chenhan D. Yu , Robert A. van de Geijn

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

The memory capacity of embedding tables in deep learning recommendation models (DLRMs) is increasing dramatically from tens of GBs to TBs across the industry. Given the fast growth in DLRMs, novel solutions are urgently needed, in order to…

Machine Learning · Computer Science 2021-01-29 Chunxing Yin , Bilge Acun , Xing Liu , Carole-Jean Wu

Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-17 Haisha Zhao , San Li , Jiaheng Wang , Chunbao Zhou , Jue Wang , Zhikuang Xin , Shunde Li , Zhiqiang Liang , Zhijie Pan , Fang Liu , Yan Zeng , Yangang Wang , Xuebin Chi

Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs

General matrix-matrix multiplication (GEMM) is a cornerstone of AI computations, making tensor processing engines (TPEs) increasingly critical in GPUs and domain-specific architectures. Existing architectures primarily optimize dataflow or…

Hardware Architecture · Computer Science 2025-03-11 Qizhe Wu , Huawen Liang , Yuchen Gui , Zhichen Zeng , Zerong He , Linfeng Tao , Xiaotian Wang , Letian Zhao , Zhaoxi Zeng , Wei Yuan , Wei Wu , Xi Jin

TAMM: Tensor Algebra for Many-body Methods

Tensor contraction operations in computational chemistry consume significant fractions of computing time on large-scale computing platforms. The widespread use of tensor contractions between large multi-dimensional tensors in describing…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-11 Erdal Mutlu , Ajay Panyala , Nitin Gawande , Abhishek Bagusetty , Jinsung Kim , Karol Kowalski , Nicholas Bauman , Bo Peng , Jiri Brabec , Sriram Krishnamoorthy

Throughput-Distortion Computation Of Generic Matrix Multiplication: Toward A Computation Channel For Digital Signal Processing Systems

The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra libraries used in many computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based…

Mathematical Software · Computer Science 2015-05-30 Davide Anastasia , Yiannis Andreopoulos

GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse

Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and…

Hardware Architecture · Computer Science 2024-05-06 Chenyang Ai , Lechuan Zhao , Zhijie Huang , Cangyuan Li , Xinan Wang , Ying Wang

Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions

Mathematical operators whose transformation rules constitute the building blocks of a multi-linear algebra are widely used in physics and engineering applications where they are very often represented as tensors. In the last century, thanks…

Mathematical Software · Computer Science 2013-07-09 Edoardo Di Napoli , Diego Fabregat-Traver , Gregorio Quintana-Ortì , Paolo Bientinesi

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-19 Zhi-Gang Liu , Paul N. Whatmough , Matthew Mattina

Tensor Contractions with Extended BLAS Kernels on CPU and GPU

Tensor contractions constitute a key computational ingredient of numerical multi-linear algebra. However, as the order and dimension of tensors grow, the time and space complexities of tensor-based computations grow quickly. Existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-15 Yang Shi , U. N. Niranjan , Animashree Anandkumar , Cris Cecka

tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI

General matrix multiplication (GEMM) is a ubiquitous computing kernel/algorithm for data processing in diverse applications, including artificial intelligence (AI) and deep learning (DL). Recent shift towards edge computing has inspired…

Hardware Architecture · Computer Science 2024-12-25 Harideep Nair , Prabhu Vellaisamy , Albert Chen , Joseph Finn , Anna Li , Manav Trivedi , John Paul Shen

A Doubly-Enhanced EM Algorithm for Model-Based Tensor Clustering

Modern scientific studies often collect data sets in the forms of tensors, which call for innovative statistical analysis methods. In particular, there is a pressing need for tensor clustering methods to understand the heterogeneity in the…

Methodology · Statistics 2021-04-27 Qing Mai , Xin Zhang , Yuqing Pan , Kai Deng

Reducing Computational Complexity of Tensor Contractions via Tensor-Train Networks

There is a significant expansion in both volume and range of applications along with the concomitant increase in the variety of data sources. These ever-expanding trends have highlighted the necessity for more versatile analysis tools that…

Numerical Analysis · Mathematics 2021-09-09 Ilya Kisil , Giuseppe G. Calvi , Kriton Konstantinidis , Yao Lei Xu , Danilo P. Mandic

Accelerating Graph Neural Networks with a Novel Matrix Compression Format

The inference and training stages of Graph Neural Networks (GNNs) are often dominated by the time required to compute a long sequence of matrix multiplications between the sparse graph adjacency matrix and its embedding. To accelerate these…

Data Structures and Algorithms · Computer Science 2024-09-05 João N. F. Alves , Samir Moustafa , Siegfried Benkner , Alexandre P. Francisco , Wilfried N. Gansterer , Luís M. S. Russo