Related papers: Design optimization for high-performance computing…
With the development of hardware-optimized deployment of spiking neural networks (SNNs), SNN processors based on field-programmable gate arrays (FPGAs) have become a research hotspot due to their efficiency and flexibility. However,…
Residual neural networks are widely used in computer vision tasks. They enable the construction of deeper and more accurate models by mitigating the vanishing gradient problem. Their main innovation is the residual block which allows the…
FPGA is appropriate for fix-point neural networks computing due to high power efficiency and configurability. However, its design must be intensively refined to achieve high performance using limited hardware resources. We present an…
Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a…
Intensive computation is entering data centers with multiple workloads of deep learning. To balance the compute efficiency, performance, and total cost of ownership (TCO), the use of a field-programmable gate array (FPGA) with…
Field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application…
New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning…
Artificial intelligence (AI) is increasingly deployed in real-time and energy-constrained environments, driving demand for hardware platforms that can deliver high performance and power efficiency. While central processing units (CPUs) and…
Since introduced, Swin Transformer has achieved remarkable results in the field of computer vision, it has sparked the need for dedicated hardware accelerators, specifically catering to edge computing demands. For the advantages of…
Transformer-based models are becoming more and more intelligent and are revolutionizing a wide range of human tasks. To support their deployment, AI labs offer inference services that consume hundreds of GWh of energy annually and charge…
Deep neural network (DNN) inference relies increasingly on specialized hardware for high computational efficiency. This work introduces a field-programmable gate array (FPGA)-based dynamically configurable accelerator featuring systolic…
When trained as generative models, Deep Learning algorithms have shown exceptional performance on tasks involving high dimensional data such as image denoising and super-resolution. In an increasingly connected world dominated by mobile and…
Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…
Recent researches on neural network have shown significant advantage in machine learning over traditional algorithms based on handcrafted features and models. Neural network is now widely adopted in regions like image, speech and video…
Low-precision is the first order knob for achieving higher Artificial Intelligence Operations (AI-TOPS). However the algorithmic space for sub-8-bit precision compute is diverse, with disruptive changes happening frequently, making FPGAs a…
This study presents advanced neural network architectures including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs) for enhanced ECG signal…
Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs…
A fundamental question lies in almost every application of deep neural networks: what is the optimal neural architecture given a specific dataset? Recently, several Neural Architecture Search (NAS) frameworks have been developed that use…
In view of the large amount of calculation and long calculation time of convolutional neural network (CNN), this paper proposes a convolutional neural network hardware accelerator based on field programmable logic gate array (FPGA). First,…
Edge AI deployment faces critical challenges balancing computational performance, energy efficiency, and resource constraints. This paper presents FPGA-accelerated RISC-V instruction set architecture (ISA) extensions for efficient neural…