Related papers: Proposal for a High Precision Tensor Processing Un…

Exploration of TPUs for AI Applications

Tensor Processing Units (TPUs) are specialized hardware accelerators for deep learning developed by Google. This paper aims to explore TPUs in cloud and edge computing focusing on its applications in AI. We provide an overview of TPUs,…

Hardware Architecture · Computer Science 2023-11-15 Diego Sanmartín Carrión , Vera Prohaska

Mixed precision in Graphics Processing Unit

Modern graphics computing units (GPUs) are designed and optimized to perform highly parallel numerical calculations. This parallelism has enabled (and promises) significant advantages, both in terms of energy performance and calculation. In…

Hardware Architecture · Computer Science 2021-10-26 Quentin Gallouédec

Simulation of quantum physics with Tensor Processing Units: brute-force computation of ground states and time evolution

Tensor Processing Units (TPUs) were developed by Google exclusively to support large-scale machine learning tasks. TPUs can, however, also be used to accelerate and scale up other computationally demanding tasks. In this paper we repurpose…

Quantum Physics · Physics 2021-11-23 Markus Hauru , Alan Morningstar , Jackson Beall , Martin Ganahl , Adam Lewis , Guifre Vidal

In-Datacenter Performance Analysis of a Tensor Processing Unit

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that…

Hardware Architecture · Computer Science 2017-04-18 Norman P. Jouppi , Cliff Young , Nishant Patil , David Patterson , Gaurav Agrawal , Raminder Bajwa , Sarah Bates , Suresh Bhatia , Nan Boden , Al Borchers , Rick Boyle , Pierre-luc Cantin , Clifford Chao , Chris Clark , Jeremy Coriell , Mike Daley , Matt Dau , Jeffrey Dean , Ben Gelb , Tara Vazir Ghaemmaghami , Rajendra Gottipati , William Gulland , Robert Hagmann , C. Richard Ho , Doug Hogberg , John Hu , Robert Hundt , Dan Hurt , Julian Ibarz , Aaron Jaffey , Alek Jaworski , Alexander Kaplan , Harshit Khaitan , Andy Koch , Naveen Kumar , Steve Lacy , James Laudon , James Law , Diemthu Le , Chris Leary , Zhuyuan Liu , Kyle Lucke , Alan Lundin , Gordon MacKean , Adriana Maggiore , Maire Mahony , Kieran Miller , Rahul Nagarajan , Ravi Narayanaswami , Ray Ni , Kathy Nix , Thomas Norrie , Mark Omernick , Narayana Penukonda , Andy Phelps , Jonathan Ross , Matt Ross , Amir Salek , Emad Samadiani , Chris Severn , Gregory Sizikov , Matthew Snelham , Jed Souter , Dan Steinberg , Andy Swing , Mercedes Tan , Gregory Thorson , Bo Tian , Horia Toma , Erick Tuttle , Vijay Vasudevan , Richard Walter , Walter Wang , Eric Wilcox , Doe Hyun Yoon

A Learned Performance Model for Tensor Processing Units

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for…

Performance · Computer Science 2021-03-19 Samuel J. Kaufman , Phitchaya Mangpo Phothilimthana , Yanqi Zhou , Charith Mendis , Sudip Roy , Amit Sabne , Mike Burrows

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML…

Hardware Architecture · Computer Science 2024-07-12 Mohammed Elbtity , Peyton Chandarana , Ramtin Zand

A Computational Model for Tensor Core Units

To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature…

Data Structures and Algorithms · Computer Science 2020-07-10 Rezaul Chowdhury , Francesco Silvestri , Flavio Vella

TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator

The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficiency requirements. Nevertheless,…

Hardware Architecture · Computer Science 2025-03-11 Deepak Vungarala , Mohammed E. Elbtity , Sumiya Syed , Sakila Alam , Kartik Pandit , Arnob Ghosh , Ramtin Zand , Shaahin Angizi

Tensor Manipulation Unit (TMU): Reconfigurable, Near-Memory Tensor Manipulation for High-Throughput AI SoC

While recent advances in AI SoC design have focused heavily on accelerating tensor computation, the equally critical task of tensor manipulation, centered on high,volume data movement with minimal computation, remains underexplored. This…

Hardware Architecture · Computer Science 2025-06-18 Weiyu Zhou , Zheng Wang , Chao Chen , Yike Li , Yongkui Yang , Zhuoyu Wu , Anupam Chattopadhyay

Leveraging Compute-in-Memory for Efficient Generative Model Inference in TPUs

With the rapid advent of generative models, efficiently deploying these models on specialized hardware has become critical. Tensor Processing Units (TPUs) are designed to accelerate AI workloads, but their high power consumption…

Hardware Architecture · Computer Science 2025-03-04 Zhantong Zhu , Hongou Li , Wenjie Ren , Meng Wu , Le Ye , Ru Huang , Tianyu Jia

Large Scale Distributed Linear Algebra With Tensor Processing Units

We have repurposed Google Tensor Processing Units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs' fast inter-core interconnects (ICI)s, physically…

Computational Physics · Physics 2022-09-14 Adam G. M. Lewis , Jackson Beall , Martin Ganahl , Markus Hauru , Shrestha Basu Mallick , Guifre Vidal

High-resolution imaging on TPUs

The rapid evolution of artificial intelligence (AI) is leading to a new generation of hardware accelerators optimized for deep learning. Some of the designs of these accelerators are general enough to allow their use for other…

Computational Engineering, Finance, and Science · Computer Science 2019-12-18 Fantine Huot , Yi-Fan Chen , Robert Clapp , Carlos Boneti , John Anderson

Photonic tensor cores for machine learning

With an ongoing trend in computing hardware towards increased heterogeneity, domain-specific co-processors are emerging as alternatives to centralized paradigms. The tensor core unit (TPU) has shown to outperform graphic process units by…

Disordered Systems and Neural Networks · Physics 2020-11-24 Mario Miscuglio , Volker J. Sorger

High Accuracy Low Precision QR Factorization and Least Square Solver on GPU with TensorCore

Driven by the insatiable needs to process ever larger amount of data with more complex models, modern computer processors and accelerators are beginning to offer half precision floating point arithmetic support, and extremely optimized…

Mathematical Software · Computer Science 2019-12-12 Shaoshuai Zhang , Panruo Wu

Implementation of float-float operators on graphics hardware

The Graphic Processing Unit (GPU) has evolved into a powerful and flexible processor. The latest graphic processors provide fully programmable vertex and pixel processing units that support vector operations up to single floating-point…

Hardware Architecture · Computer Science 2007-05-23 Guillaume Da Graçca , David Defour

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s

This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall. The design of the proposed algorithm…

Performance · Computer Science 2022-07-01 Felix Chern , Blake Hechtman , Andy Davis , Ruiqi Guo , David Majnemer , Sanjiv Kumar

GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse

Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and…

Hardware Architecture · Computer Science 2024-05-06 Chenyang Ai , Lechuan Zhao , Zhijie Huang , Cangyuan Li , Xinan Wang , Ying Wang

High-Throughput Parallel Viterbi Decoder on GPU Tensor Cores

Many research works have been performed on implementation of Vitrerbi decoding algorithm on GPU instead of FPGA because this platform provides considerable flexibility in addition to great performance. Recently, the recently-introduced…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-30 Alireza Mohammadidoost , Matin Hashemi

TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs

Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the…

Machine Learning · Computer Science 2023-12-07 Phitchaya Mangpo Phothilimthana , Sami Abu-El-Haija , Kaidi Cao , Bahare Fatemi , Mike Burrows , Charith Mendis , Bryan Perozzi

Hardware Acceleration of Explainable Machine Learning using Tensor Processing Units

Machine learning (ML) is successful in achieving human-level performance in various fields. However, it lacks the ability to explain an outcome due to its black-box nature. While existing explainable ML is promising, almost all of these…

Machine Learning · Computer Science 2021-03-23 Zhixin Pan , Prabhat Mishra