English
Related papers

Related papers: EN-T: Optimizing Tensor Computing Engines Performa…

200 papers

General matrix-matrix multiplication (GEMM) is a cornerstone of AI computations, making tensor processing engines (TPEs) increasingly critical in GPUs and domain-specific architectures. Existing architectures primarily optimize dataflow or…

Hardware Architecture · Computer Science 2025-03-11 Qizhe Wu , Huawen Liang , Yuchen Gui , Zhichen Zeng , Zerong He , Linfeng Tao , Xiaotian Wang , Letian Zhao , Zhaoxi Zeng , Wei Yuan , Wei Wu , Xi Jin

Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and…

Hardware Architecture · Computer Science 2024-05-06 Chenyang Ai , Lechuan Zhao , Zhijie Huang , Cangyuan Li , Xinan Wang , Ying Wang

Path-integral techniques are a powerful tool used in open quantum systems to provide an exact solution for the non-Markovian dynamics. However, the exponential scaling of the tensor size with quantum memory length of these techniques limits…

Quantum Physics · Physics 2025-08-25 L. M. J. Hall , A. Gisdakis , E. A. Muljarov

Tensor networks are a class of algorithms aimed at reducing the computational complexity of high-dimensional problems. They are used in an increasing number of applications, from quantum simulations to machine learning. Exploiting data…

Numerical Analysis · Mathematics 2024-10-25 Melven Röhrig-Zöllner , Manuel Joey Becklas , Jonas Thies , Achim Basermann

Many critical EDA problems suffer from the curse of dimensionality, i.e. the very fast-scaling computational burden produced by large number of parameters and/or unknown variables. This phenomenon may be caused by multiple spatial or…

Numerical Analysis · Computer Science 2016-11-18 Zheng Zhang , Kim Batselier , Haotian Liu , Luca Daniel , Ngai Wong

High-order tensor decomposition has been widely adopted to obtain compact deep neural networks for edge deployment. However, existing studies focus primarily on its algorithmic advantages such as accuracy and compression ratio-while…

Hardware Architecture · Computer Science 2025-11-26 Jinsong Zhang , Minghe Li , Jiayi Tian , Jinming Lu , Zheng Zhang

To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature…

Data Structures and Algorithms · Computer Science 2020-07-10 Rezaul Chowdhury , Francesco Silvestri , Flavio Vella

The shift to data-intensive processing from the cloud to the edge has introduced new challenges and expectations for the next generation of intelligent computing systems. As the memory wall continues to grow, modern systems can only meet…

Hardware Architecture · Computer Science 2026-04-16 Denis Hoornaert , Cole Strickler , Manos Athanassoulis , Marco Caccamo , Heechul Yun , Renato Mancuso

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Tensor networks have proven to be a valuable tool, for instance, in the classical simulation of (strongly correlated) quantum systems. As the size of the systems increases, contracting larger tensor networks becomes computationally…

Quantum Physics · Physics 2025-07-29 Manuel Geiger , Qunsheng Huang , Christian B. Mendl

Driven by deep learning, there has been a surge of specialized processors for matrix multiplication, referred to as TensorCore Units (TCUs). These TCUs are capable of performing matrix multiplications on small matrices (usually 4x4 or…

Performance · Computer Science 2019-11-26 Abdul Dakkak , Cheng Li , Isaac Gelado , Jinjun Xiong , Wen-mei Hwu

Tensors provide a robust framework for managing high-dimensional data. Consequently, tensor analysis has emerged as an active research area in various domains, including machine learning, signal processing, computer vision, graph analysis,…

Computation · Statistics 2025-10-01 Michele Gallo

Tensor cores, along with tensor processing units, represent a new form of hardware acceleration specifically designed for deep neural network calculations in artificial intelligence applications. Tensor cores provide extraordinary…

Running quantum algorithms often involves implementing complex quantum circuits with such a large number of multi-qubit gates that the challenge of tackling practical applications appears daunting. To date, no experiments have successfully…

The proposed article aims at offering a comprehensive tutorial for the computational aspects of structured matrix and tensor factorization. Unlike existing tutorials that mainly focus on {\it algorithmic procedures} for a small set of…

Signal Processing · Electrical Eng. & Systems 2023-07-19 Xiao Fu , Nico Vervliet , Lieven De Lathauwer , Kejun Huang , Nicolas Gillis

Tensor networks are a tool first employed in the context of many-body quantum physics that now have a wide range of uses across the computational sciences, from numerical methods to machine learning. Methods integrating tensor networks into…

Machine Learning · Computer Science 2026-04-27 John Gardiner , Javier Lopez-Piqueres

Tensor computation has emerged as a powerful mathematical tool for solving high-dimensional and/or extreme-scale problems in science and engineering. The last decade has witnessed tremendous advancement of tensor computation and its…

Signal Processing · Electrical Eng. & Systems 2019-07-05 Kaiqi Zhang , Xiyuan Zhang , Zheng Zhang

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that…

Tensor Core is a mixed-precision matrix-matrix multiplication unit on NVIDIA GPUs with a theoretical peak performance of more than 300 TFlop/s on Ampere architectures. Tensor Cores were developed in response to the high demand of dense…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-19 Hiroyuki Ootomo , Rio Yokota

NVIDIA Tensor Core is a mixed-precision matrix-matrix multiplication and addition computing unit, where the theoretical peak performance is more than 300 TFlop/s on NVIDIA A100 GPU. NVIDIA provides WMMA API for using Tensor Cores in custom…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-30 Hiroyuki Ootomo , Rio Yokota
‹ Prev 1 2 3 10 Next ›