English
Related papers

Related papers: GPTPU: Accelerating Applications using Edge Tensor…

200 papers

Tensor Processing Units (TPUs) are specialized hardware accelerators for deep learning developed by Google. This paper aims to explore TPUs in cloud and edge computing focusing on its applications in AI. We provide an overview of TPUs,…

Hardware Architecture · Computer Science 2023-11-15 Diego Sanmartín Carrión , Vera Prohaska

Computing at the edge is important in remote settings, however, conventional hardware is not optimized for utilizing deep neural networks. The Google Edge TPU is an emerging hardware accelerator that is cost, power and speed efficient, and…

Computer Vision and Pattern Recognition · Computer Science 2022-11-01 Yipeng Sun , Andreas M Kist

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML…

Hardware Architecture · Computer Science 2024-07-12 Mohammed Elbtity , Peyton Chandarana , Ramtin Zand

Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often…

Artificial Intelligence · Computer Science 2024-09-24 Rakshith Jayanth , Neelesh Gupta , Viktor Prasanna

Emerging edge computing platforms often contain machine learning (ML) accelerators that can accelerate inference for a wide range of neural network (NN) models. These models are designed to fit within the limited area and energy constraints…

Hardware Architecture · Computer Science 2021-09-30 Amirali Boroumand , Saugata Ghose , Berkin Akin , Ravi Narayanaswami , Geraldo F. Oliveira , Xiaoyu Ma , Eric Shiu , Onur Mutlu

Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly…

Machine Learning · Computer Science 2022-10-25 xiangyang Ju , Yunsong Wang , Daniel Murnane , Nicholas Choma , Steven Farrell , Paolo Calafiura

In this paper, we describe the algorithms we implemented in FDPS to make efficient use of accelerator hardware such as GPGPUs. We have developed FDPS to make it possible for many researchers to develop their own high-performance parallel…

Instrumentation and Methods for Astrophysics · Physics 2020-02-12 Masaki Iwasawa , Daisuke Namekata , Keigo Nitadori , Kentaro Nomura , Long Wang , Miyuki Tsubouchi , Junichiro Makino

Recently, tensor algebra have witnessed significant applications across various domains. Each operator in tensor algebra features different computational workload and precision. However, current general accelerators, such as VPU, GPGPU, and…

Hardware Architecture · Computer Science 2024-05-06 Chenyang Ai , Lechuan Zhao , Zhijie Huang , Cangyuan Li , Xinan Wang , Ying Wang

Graph neural networks (GNNs) have seen extensive application in domains such as social networks, bioinformatics, and recommendation systems. However, the irregularity and sparsity of graph data challenge traditional computing methods, which…

Machine Learning · Computer Science 2025-02-25 Ka Wai Wu

Edge TPUs are a domain of accelerators for low-power, edge devices and are widely used in various Google products such as Coral and Pixel devices. In this paper, we first discuss the major microarchitectural details of Edge TPUs. Then, we…

Machine Learning · Computer Science 2022-10-12 Kiran Seshadri , Berkin Akin , James Laudon , Ravi Narayanaswami , Amir Yazdanbakhsh

In this paper, we propose TensorFHE, an FHE acceleration solution based on GPGPU for real applications on encrypted data. TensorFHE utilizes Tensor Core Units (TCUs) to boost the computation of Number Theoretic Transform (NTT), which is the…

Hardware Architecture · Computer Science 2023-01-02 Shengyu Fan , Zhiwei Wang , Weizhi Xu , Rui Hou , Dan Meng , Mingzhe Zhang

Tensor accelerators now represent a growing share of compute resources in modern CPUs and GPUs. However, they are hard to program, leading developers to use vendor-provided kernel libraries that support tensor accelerators. As a result, the…

Programming Languages · Computer Science 2026-02-12 Yihong Zhang , Derek Gerstmann , Andrew Adams , Maaz Bin Safeer Ahmad

Neural Networks (NN) provide a solid and reliable way of executing different types of applications, ranging from speech recognition to medical diagnosis, speeding up onerous and long workloads. The challenges involved in their…

Hardware Architecture · Computer Science 2023-09-26 Federico Manca , Francesco Ratto

Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse…

Machine Learning · Computer Science 2023-06-02 Yuke Wang , Boyuan Feng , Zheng Wang , Guyue Huang , Yufei Ding

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that…

Graph neural networks (GNNs) start to gain momentum after showing significant performance improvement in a variety of domains including molecular science, recommendation, and transportation. Turning such performance improvement of GNNs into…

Hardware Architecture · Computer Science 2021-07-20 Zhihui Zhang , Jingwen Leng , Shuwen Lu , Youshan Miao , Yijia Diao , Minyi Guo , Chao Li , Yuhao Zhu

There are plenty of graph neural network (GNN) accelerators being proposed. However, they highly rely on users' hardware expertise and are usually optimized for one specific GNN model, making them challenging for practical use. Therefore,…

Hardware Architecture · Computer Science 2025-10-27 Stefan Abi-Karam , Cong Hao

General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-10 Vajira Thambawita , Roshan G. Ragel , Dhammike Elkaduwe

General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-20 Ming Li , Ziqian Bi , Tianyang Wang , Yizhu Wen , Qian Niu , Xinyuan Song , Zekun Jiang , Junyu Liu , Benji Peng , Sen Zhang , Xuanhe Pan , Jiawei Xu , Jinlang Wang , Keyu Chen , Caitlyn Heqi Yin , Pohsun Feng , Ming Liu

Low-latency, low-power portable recurrent neural network (RNN) accelerators offer powerful inference capabilities for real-time applications such as IoT, robotics, and human-machine interaction. We propose a lightweight Gated Recurrent Unit…

Hardware Architecture · Computer Science 2020-12-29 Chang Gao , Antonio Rios-Navarro , Xi Chen , Shih-Chii Liu , Tobi Delbruck
‹ Prev 1 2 3 10 Next ›