English
Related papers

Related papers: Exploring the Vision Processing Unit as Co-process…

200 papers

The attention layer, a core component of Transformer-based LLMs, brings out inefficiencies in current GPU systems due to its low operational intensity and the substantial memory requirements of KV caches. We propose a High-bandwidth…

Hardware Architecture · Computer Science 2025-12-19 Myunghyun Rhee , Joonseop Sim , Taeyoung Ahn , Seungyong Lee , Daegun Yoon , Euiseok Kim , Kyoung Park , Youngpyo Joo , Hoshik Kim

In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two…

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that…

Machine learning applications are computationally demanding and power intensive. Hardware acceleration of these software tools is a natural step being explored using various technologies. A recurrent processing unit (RPU) is fast and…

Emerging Technologies · Computer Science 2019-12-17 Heidi Komkov , Alessandro Restelli , Brian Hunt , Liam Shaughnessy , Itamar Shani , Daniel P. Lathrop

High Performance Computing (HPC) aims at providing reasonably fast computing solutions to scientific and real life problems. The advent of multicore architectures is noticeable in the HPC history, because it has brought the underlying…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-07 Claude Tadonki

This machine learning study investigates a lowcost edge device integrated with an embedded system having computer vision and resulting in an improved performance in inferencing time and precision of object detection and classification. A…

Robotics · Computer Science 2024-10-08 Richard C. Rodriguez , Jonah Elijah P. Bardos

Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and…

Computer Vision and Pattern Recognition · Computer Science 2019-07-01 Murad Qasaimeh , Kristof Denolf , Jack Lo , Kees Vissers , Joseph Zambreno , Phillip H. Jones

In recent years, Transformer has achieved good results in Natural Language Processing (NLP) and has also started to expand into Computer Vision (CV). Excellent models such as the Vision Transformer and Swin Transformer have emerged. At the…

Computer Vision and Pattern Recognition · Computer Science 2021-10-22 Wei Hu , Dian Xu , Zimeng Fan , Fang Liu , Yanxiang He

Recent research on vision backbone architectures has predominantly focused on optimizing efficiency for hardware platforms with high parallel processing capabilities. This category increasingly includes embedded systems such as mobile…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Moritz Nottebaum , Matteo Dunnhofer , Christian Micheloni

Graphics processing units (GPUs) can improve deep neural network inference throughput via batch processing, where multiple tasks are concurrently processed. We focus on novel scenarios that the energy-constrained mobile devices offload…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-14 Wenqi Shi , Sheng Zhou , Zhisheng Niu , Miao Jiang , Lu Geng

Hybrid computational architectures based on the joint power of Central Processing Units and Graphic Processing Units (GPUs) are becoming popular and powerful hardware tools for a wide range of simulations in biology, chemistry, engineering,…

Instrumentation and Methods for Astrophysics · Physics 2015-06-15 Roberto Capuzzo-Dolcetta , Mario Spera

Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a…

Programming Languages · Computer Science 2014-02-07 Brijender Kahanwal

Modern chip designs are increasingly complex, making it difficult for developers to glean meaningful insights about hardware behavior while real workloads are running. Hardware introspection aims to solve this by enabling the hardware…

Hardware Architecture · Computer Science 2025-09-29 Ian McDougall , Shayne Wadle , Harish Batchu , Karthikeyan Sankaralingam

Deploying multiple machine learning models on resource-constrained robotic platforms for different perception tasks often results in redundant computations, large memory footprints, and complex integration challenges. In response, this work…

Robotics · Computer Science 2025-08-19 Jakub Łucki , Jonathan Becktor , Georgios Georgakis , Rob Royce , Shehryar Khattak

High Performance Computing (HPC) platforms allow scientists to model computationally intensive algorithms. HPC clusters increasingly use General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs provide an attractive…

Hardware Architecture · Computer Science 2015-04-20 Syed Waqar Nabi , Saji N. Hameed , Wim Vanderbauwhede

The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors…

Emerging Technologies · Computer Science 2015-12-10 Keith A. Britt , Travis S. Humble

With an ongoing trend in computing hardware towards increased heterogeneity, domain-specific co-processors are emerging as alternatives to centralized paradigms. The tensor core unit (TPU) has shown to outperform graphic process units by…

Disordered Systems and Neural Networks · Physics 2020-11-24 Mario Miscuglio , Volker J. Sorger

In recent decades, High Performance Computing (HPC) has undergone significant enhancements, particularly in the realm of hardware platforms, aimed at delivering increased processing power while keeping power consumption within reasonable…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-03 S. -Kazem Shekofteh , Christian Alles , Nils Kochendörfer , Holger Fröning

Efficient on-device neural network (NN) inference offers predictable latency, improved privacy and reliability, and lower operating costs for vendors than cloud-based inference. This has sparked recent development of microcontroller-scale…

Machine Learning · Computer Science 2025-11-03 Josh Millar , Yushan Huang , Sarab Sethi , Hamed Haddadi , Anil Madhavapeddy

Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation, and object detection and localization. Here we consider the problem of inference, the application of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-21 Aleksandar Zlateski , Kisuk Lee , H. Sebastian Seung
‹ Prev 1 2 3 10 Next ›