English
Related papers

Related papers: Optimizing Memory-Access Patterns for Deep Learnin…

200 papers

Deep learning (DL) has emerged as a rapidly developing advanced technology, enabling the performance of complex tasks involving image recognition, natural language processing, and autonomous decision-making with high levels of accuracy.…

Hardware Architecture · Computer Science 2026-03-11 Soumita Chatterjee , Sudip Ghosh , Tamal Ghosh , Hafizur Rahaman

Recent trends in deep learning (DL) have made hardware accelerators essential for various high-performance computing (HPC) applications, including image classification, computer vision, and speech recognition. This survey summarizes and…

As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory…

Hardware Architecture · Computer Science 2024-04-25 Oliver Bause , Paul Palomero Bernardo , Oliver Bringmann

The design and implementation of Deep Learning (DL) models is currently receiving a lot of attention from both industrials and academics. However, the computational workload associated with DL is often out of reach for low-power embedded…

Hardware Architecture · Computer Science 2022-12-09 Etienne Dupuis , Silviu-Ioan Filip , Olivier Sentieys , David Novo , Ian O'Connor , Alberto Bosio

Deep learning (DL) has been widely adopted those last years but they are computing-intensive method. Therefore, scientists proposed diverse optimization to accelerate their predictions for end-user applications. However, no single inference…

Machine Learning · Computer Science 2022-10-11 Pierrick Pochelu

Deep learning (DL) accelerators are increasingly deployed on edge devices to support fast local inferences. However, they suffer from a new security problem, i.e., being vulnerable to physical access based attacks. An adversary can easily…

Hardware Architecture · Computer Science 2020-08-11 Pengfei Zuo , Yu Hua , Ling Liang , Xinfeng Xie , Xing Hu , Yuan Xie

The growing adoption of Deep Learning (DL) applications in the Internet of Things has increased the demand for energy-efficient accelerators. Field Programmable Gate Arrays (FPGAs) offer a promising platform for such acceleration due to…

Hardware Architecture · Computer Science 2025-04-15 Chao Qian

There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the…

Machine Learning · Computer Science 2024-05-06 Sicong Liu , Wentao Zhou , Zimu Zhou , Bin Guo , Minfan Wang , Cheng Fang , Zheng Lin , Zhiwen Yu

As the models and the datasets to train deep learning (DL) models scale, system architects are faced with new challenges, one of which is the memory capacity bottleneck, where the limited physical memory inside the accelerator device…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-19 Youngeun Kwon , Minsoo Rhu

Deep learning (DL) models have become core modules for many applications. However, deploying these models without careful performance benchmarking that considers both hardware and software's impact often leads to poor service and costly…

Machine Learning · Computer Science 2021-01-06 Huaizheng Zhang , Yizheng Huang , Yonggang Wen , Jianxiong Yin , Kyle Guan

The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and…

Hardware Architecture · Computer Science 2023-12-21 Alireza Amirshahi , Giovanni Ansaloni , David Atienza

Deploying Deep Learning (DL) on embedded end devices is a scorching trend in pervasive computing. Since most Microcontrollers on embedded devices have limited computing power, it is necessary to add a DL accelerator. Embedded Field…

Hardware Architecture · Computer Science 2024-09-17 Chao Qian , Tianheng Ling , Gregor Schiele

Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…

Databases · Computer Science 2021-02-09 Jonas Dann , Daniel Ritter , Holger Fröning

Deep learning (DL) is becoming the cornerstone of numerous applications both in datacenters and at the edge. Specialized hardware is often necessary to meet the performance requirements of state-of-the-art DL models, but the rapid pace of…

Hardware Architecture · Computer Science 2025-12-16 Andrew Boutros , Aman Arora , Vaughn Betz

In recent years, domain-specific hardware has brought significant performance improvements in deep learning (DL). Both industry and academia only focus on throughput when evaluating these AI accelerators, which usually are custom ASICs…

Performance · Computer Science 2019-11-11 Zihan Jiang , Jiansong Li , Jiangfeng Zhan

Accelerating deep model training and inference is crucial in practice. Existing deep learning frameworks usually concentrate on optimizing training speed and pay fewer attentions to inference-specific optimizations. Actually, model…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-12 Yongchao Liu , Yue Jin , Yong Chen , Teng Teng , Hang Ou , Rui Zhao , Yao Zhang

Deep Recommender Models (DLRMs) inference is a fundamental AI workload accounting for more than 79% of the total AI workload in Meta's data centers. DLRMs' performance bottleneck is found in the embedding layers, which perform many random…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-03 Giuseppe Ruggeri , Renzo Andri , Daniele Jahier Pagliari , Lukas Cavigelli

Specialized hardware accelerators have been extensively used for Deep Neural Networks (DNNs) to provide power/performance benefits. These accelerators contain specialized hardware that supports DNN operators, and scratchpad memory for…

Machine Learning · Computer Science 2023-12-01 Yi Li , Aarti Gupta , Sharad Malik

Deep learning (DL) for network models have achieved excellent performance in the field and are becoming a promising component in future intelligent network system. Programmable in-network computing device has great potential to deploy DL…

Hardware Architecture · Computer Science 2023-08-23 Dong Wen , Tao Li , Chenglong Li , Pengye Xia , Hui Yang , Zhigang Sun

Fault-tolerant deep learning accelerator is the basis for highly reliable deep learning processing and critical to deploy deep learning in safety-critical applications such as avionics and robotics. Since deep learning is known to be…

Hardware Architecture · Computer Science 2023-12-22 Qing Zhang , Cheng Liu , Bo Liu , Haitong Huang , Ying Wang , Huawei Li , Xiaowei Li
‹ Prev 1 2 3 10 Next ›