Related papers: HAPI: Hardware-Aware Progressive Inference
During the last years, algorithms known as Convolutional Neural Networks (CNNs) had become increasingly popular, expanding its application range to several areas. In particular, the image processing field has experienced a remarkable…
DNNs are becoming less and less over-parametrised due to recent advances in efficient model design, through careful hand-crafted or NAS-based methods. Relying on the fact that not all inputs require the same amount of computation to yield a…
This paper is aimed at developing a method that reduces the computational cost of convolutional neural networks (CNN) during inference. Conventionally, the input data pass through a fixed neural network architecture. However, easy examples…
Deep neural networks have significantly improved performance on a range of tasks with the increasing demand for computational resources, leaving deployment on low-resource devices (with limited memory and battery power) infeasible. Binary…
By leveraging the data sample diversity, the early-exit network recently emerges as a prominent neural network architecture to accelerate the deep learning inference process. However, intermediate classifiers of the early exits introduce…
For time-critical IoT applications using deep learning, inference acceleration through distributed computing is a promising approach to meet a stringent deadline. In this paper, we implement a working prototype of a new distributed…
In order to enhance the real-time performance of convolutional neural networks(CNNs), more and more researchers are focusing on improving the efficiency of CNN. Based on the analysis of some CNN architectures, such as ResNet, DenseNet,…
This letter presents a novel high impedance fault (HIF) detection approach using a convolutional neural network (CNN). Compared to traditional artificial neural networks, a CNN offers translation invariance and it can accurately detect HIFs…
Deep Neural Networks (DNNs) have grown increasingly large in size to achieve state of the art performance across a wide range of tasks. However, their high computational requirements make them less suitable for resource-constrained…
Deep neural networks (DNNs) have been successfully applied in various fields. In DNNs, a large number of multiply-accumulate (MAC) operations are required to be performed, posing critical challenges in applying them in resource-constrained…
Convolutional neural network (CNN) inference using fully homomorphic encryption (FHE) is a promising private inference (PI) solution due to the capability of FHE that enables offloading the whole computation process to the server while…
This paper optimizes the Convolutional Neural Network (CNN) algorithm using high-performance computing (HPC) technologies. It uses multi-core processors, GPUs, and parallel computing frameworks like OpenMPI and CUDA to speed up CNN model…
Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire deep neural network (DNN) model but processes only a subset…
Security of inference phase deployment of Convolutional neural network (CNN) into resource constrained embedded systems (e.g. low end FPGAs) is a growing research area. Using secure practices, third party FPGA designers can be provided with…
Early-exit neural networks (EENNs) accelerate inference by allowing intermediate classifiers to stop computation once predictions are confident enough. Most methods rely on confidence thresholds for exiting, and consequently, improving…
Well-trained deep neural networks (DNNs) treat all test samples equally during prediction. Adaptive DNN inference with early exiting leverages the observation that some test examples can be easier to predict than others. This paper presents…
Convolutional Neural Networks (CNNs) combine large amounts of parallelizable computation with frequent memory access. Field Programmable Gate Arrays (FPGAs) can achieve low latency and high throughput CNN inference by implementing dataflow…
Transformers are ubiquitous in Natural Language Processing (NLP) tasks, but they are difficult to be deployed on hardware due to the intensive computation. To enable low-latency inference on resource-constrained hardware platforms, we…
We present both a novel Convolutional Neural Network (CNN) accelerator architecture and a network compiler for FPGAs that outperforms all prior work. Instead of having generic processing elements that together process one layer at a time,…
Deployment of dynamic neural networks on edge accelerators requires careful consideration of hardware constraints beyond conventional complexity metrics such as Multiply-Accumulate operations. In Early-Exiting Neural Networks (EENN), exit…