Related papers: Optimizing Memory-Access Patterns for Deep Learnin…
Deep learning (DL) has emerged as a rapidly developing advanced technology, enabling the performance of complex tasks involving image recognition, natural language processing, and autonomous decision-making with high levels of accuracy.…
Recent trends in deep learning (DL) have made hardware accelerators essential for various high-performance computing (HPC) applications, including image classification, computer vision, and speech recognition. This survey summarizes and…
As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory…
The design and implementation of Deep Learning (DL) models is currently receiving a lot of attention from both industrials and academics. However, the computational workload associated with DL is often out of reach for low-power embedded…
Deep learning (DL) has been widely adopted those last years but they are computing-intensive method. Therefore, scientists proposed diverse optimization to accelerate their predictions for end-user applications. However, no single inference…
Deep learning (DL) accelerators are increasingly deployed on edge devices to support fast local inferences. However, they suffer from a new security problem, i.e., being vulnerable to physical access based attacks. An adversary can easily…
The growing adoption of Deep Learning (DL) applications in the Internet of Things has increased the demand for energy-efficient accelerators. Field Programmable Gate Arrays (FPGAs) offer a promising platform for such acceleration due to…
There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the…
As the models and the datasets to train deep learning (DL) models scale, system architects are faced with new challenges, one of which is the memory capacity bottleneck, where the limited physical memory inside the accelerator device…
Deep learning (DL) models have become core modules for many applications. However, deploying these models without careful performance benchmarking that considers both hardware and software's impact often leads to poor service and costly…
The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and…
Deploying Deep Learning (DL) on embedded end devices is a scorching trend in pervasive computing. Since most Microcontrollers on embedded devices have limited computing power, it is necessary to add a DL accelerator. Embedded Field…
Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…
Deep learning (DL) is becoming the cornerstone of numerous applications both in datacenters and at the edge. Specialized hardware is often necessary to meet the performance requirements of state-of-the-art DL models, but the rapid pace of…
In recent years, domain-specific hardware has brought significant performance improvements in deep learning (DL). Both industry and academia only focus on throughput when evaluating these AI accelerators, which usually are custom ASICs…
Accelerating deep model training and inference is crucial in practice. Existing deep learning frameworks usually concentrate on optimizing training speed and pay fewer attentions to inference-specific optimizations. Actually, model…
Deep Recommender Models (DLRMs) inference is a fundamental AI workload accounting for more than 79% of the total AI workload in Meta's data centers. DLRMs' performance bottleneck is found in the embedding layers, which perform many random…
Specialized hardware accelerators have been extensively used for Deep Neural Networks (DNNs) to provide power/performance benefits. These accelerators contain specialized hardware that supports DNN operators, and scratchpad memory for…
Deep learning (DL) for network models have achieved excellent performance in the field and are becoming a promising component in future intelligent network system. Programmable in-network computing device has great potential to deploy DL…
Fault-tolerant deep learning accelerator is the basis for highly reliable deep learning processing and critical to deploy deep learning in safety-critical applications such as avionics and robotics. Since deep learning is known to be…