Related papers: EN-T: Optimizing Tensor Computing Engines Performa…

Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs

General matrix-matrix multiplication (GEMM) is a cornerstone of AI computations, making tensor processing engines (TPEs) increasingly critical in GPUs and domain-specific architectures. Existing architectures primarily optimize dataflow or…

Hardware Architecture · Computer Science 2025-03-11 Qizhe Wu , Huawen Liang , Yuchen Gui , Zhichen Zeng , Zerong He , Linfeng Tao , Xiaotian Wang , Letian Zhao , Zhaoxi Zeng , Wei Yuan , Wei Wu , Xi Jin

GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse

Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and…

Hardware Architecture · Computer Science 2024-05-06 Chenyang Ai , Lechuan Zhao , Zhijie Huang , Cangyuan Li , Xinan Wang , Ying Wang

Optimization of path-integral tensor-multiplication schemes in open quantum systems

Path-integral techniques are a powerful tool used in open quantum systems to provide an exact solution for the non-Markovian dynamics. However, the exponential scaling of the tensor size with quantum memory length of these techniques limits…

Quantum Physics · Physics 2025-08-25 L. M. J. Hall , A. Gisdakis , E. A. Muljarov

Performance of linear solvers in tensor-train format on current multicore architectures

Tensor networks are a class of algorithms aimed at reducing the computational complexity of high-dimensional problems. They are used in an increasing number of applications, from quantum simulations to machine learning. Exploiting data…

Numerical Analysis · Mathematics 2024-10-25 Melven Röhrig-Zöllner , Manuel Joey Becklas , Jonas Thies , Achim Basermann

Tensor Computation: A New Framework for High-Dimensional Problems in EDA

Many critical EDA problems suffer from the curse of dimensionality, i.e. the very fast-scaling computational burden produced by large number of parameters and/or unknown variables. This phenomenon may be caused by multiple spatial or…

Numerical Analysis · Computer Science 2016-11-18 Zheng Zhang , Kim Batselier , Haotian Liu , Luca Daniel , Ngai Wong

Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators

High-order tensor decomposition has been widely adopted to obtain compact deep neural networks for edge deployment. However, existing studies focus primarily on its algorithmic advantages such as accuracy and compression ratio-while…

Hardware Architecture · Computer Science 2025-11-26 Jinsong Zhang , Minghe Li , Jiayi Tian , Jinming Lu , Zheng Zhang

A Computational Model for Tensor Core Units

To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature…

Data Structures and Algorithms · Computer Science 2020-07-10 Rezaul Chowdhury , Francesco Silvestri , Flavio Vella

Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality

The shift to data-intensive processing from the cloud to the edge has introduced new challenges and expectations for the next generation of intelligent computing systems. As the memory wall continues to grow, modern systems can only meet…

Hardware Architecture · Computer Science 2026-04-16 Denis Hoornaert , Cole Strickler , Manos Athanassoulis , Marco Caccamo , Heechul Yun , Renato Mancuso

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Optimizing Tensor Network Partitioning using Simulated Annealing

Tensor networks have proven to be a valuable tool, for instance, in the classical simulation of (strongly correlated) quantum systems. As the size of the systems increases, contracting larger tensor networks becomes computationally…

Quantum Physics · Physics 2025-07-29 Manuel Geiger , Qunsheng Huang , Christian B. Mendl

Accelerating Reduction and Scan Using Tensor Core Units

Driven by deep learning, there has been a surge of specialized processors for matrix multiplication, referred to as TensorCore Units (TCUs). These TCUs are capable of performing matrix multiplications on small matrices (usually 4x4 or…

Performance · Computer Science 2019-11-26 Abdul Dakkak , Cheng Li , Isaac Gelado , Jinjun Xiong , Wen-mei Hwu

tenSVD algorithm for compression

Tensors provide a robust framework for managing high-dimensional data. Consequently, tensor analysis has emerged as an active research area in various domains, including machine learning, signal processing, computer vision, graph analysis,…

Computation · Statistics 2025-10-01 Michele Gallo

Quantum-based Molecular Dynamics Simulations Using Tensor Cores

Tensor cores, along with tensor processing units, represent a new form of hardware acceleration specifically designed for deep neural network calculations in artificial intelligence applications. Tensor cores provide extraordinary…

Computational Physics · Physics 2021-09-14 Joshua Finkelstein , Justin S. Smith , Susan M. Mniszewski , Kipton Barros , Christian F. A. Negre , Emanuel H. Rubensson , Anders M. N. Niklasson

Tensor Quantum Programming

Running quantum algorithms often involves implementing complex quantum circuits with such a large number of multi-qubit gates that the challenge of tackling practical applications appears daunting. To date, no experiments have successfully…

Quantum Physics · Physics 2024-03-21 A. Termanova , Ar. Melnikov , E. Mamenchikov , N. Belokonev , S. Dolgov , A. Berezutskii , R. Ellerbrock , C. Mansell , M. Perelshtein

Computing Large-Scale Matrix and Tensor Decomposition with Structured Factors: A Unified Nonconvex Optimization Perspective

The proposed article aims at offering a comprehensive tutorial for the computational aspects of structured matrix and tensor factorization. Unlike existing tutorials that mainly focus on {\it algorithmic procedures} for a small set of…

Signal Processing · Electrical Eng. & Systems 2023-07-19 Xiao Fu , Nico Vervliet , Lieven De Lathauwer , Kejun Huang , Nicolas Gillis

Tensor Network Estimation of Distribution Algorithms

Tensor networks are a tool first employed in the context of many-body quantum physics that now have a wide range of uses across the computational sciences, from numerical methods to machine learning. Methods integrating tensor networks into…

Machine Learning · Computer Science 2026-04-27 John Gardiner , Javier Lopez-Piqueres

Tucker Tensor Decomposition on FPGA

Tensor computation has emerged as a powerful mathematical tool for solving high-dimensional and/or extreme-scale problems in science and engineering. The last decade has witnessed tremendous advancement of tensor computation and its…

Signal Processing · Electrical Eng. & Systems 2019-07-05 Kaiqi Zhang , Xiyuan Zhang , Zheng Zhang

In-Datacenter Performance Analysis of a Tensor Processing Unit

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that…

Hardware Architecture · Computer Science 2017-04-18 Norman P. Jouppi , Cliff Young , Nishant Patil , David Patterson , Gaurav Agrawal , Raminder Bajwa , Sarah Bates , Suresh Bhatia , Nan Boden , Al Borchers , Rick Boyle , Pierre-luc Cantin , Clifford Chao , Chris Clark , Jeremy Coriell , Mike Daley , Matt Dau , Jeffrey Dean , Ben Gelb , Tara Vazir Ghaemmaghami , Rajendra Gottipati , William Gulland , Robert Hagmann , C. Richard Ho , Doug Hogberg , John Hu , Robert Hundt , Dan Hurt , Julian Ibarz , Aaron Jaffey , Alek Jaworski , Alexander Kaplan , Harshit Khaitan , Andy Koch , Naveen Kumar , Steve Lacy , James Laudon , James Law , Diemthu Le , Chris Leary , Zhuyuan Liu , Kyle Lucke , Alan Lundin , Gordon MacKean , Adriana Maggiore , Maire Mahony , Kieran Miller , Rahul Nagarajan , Ravi Narayanaswami , Ray Ni , Kathy Nix , Thomas Norrie , Mark Omernick , Narayana Penukonda , Andy Phelps , Jonathan Ross , Matt Ross , Amir Salek , Emad Samadiani , Chris Severn , Gregory Sizikov , Matthew Snelham , Jed Souter , Dan Steinberg , Andy Swing , Mercedes Tan , Gregory Thorson , Bo Tian , Horia Toma , Erick Tuttle , Vijay Vasudevan , Richard Walter , Walter Wang , Eric Wilcox , Doe Hyun Yoon

Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance

Tensor Core is a mixed-precision matrix-matrix multiplication unit on NVIDIA GPUs with a theoretical peak performance of more than 300 TFlop/s on Ampere architectures. Tensor Cores were developed in response to the high demand of dense…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-19 Hiroyuki Ootomo , Rio Yokota

Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library

NVIDIA Tensor Core is a mixed-precision matrix-matrix multiplication and addition computing unit, where the theoretical peak performance is more than 300 TFlop/s on NVIDIA A100 GPU. NVIDIA provides WMMA API for using Tensor Cores in custom…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-30 Hiroyuki Ootomo , Rio Yokota