Related papers: Improving Memory Utilization in Convolutional Neur…
"Lightweight convolutional neural networks" is an important research topic in the field of embedded vision. To implement image recognition tasks on a resource-limited hardware platform, it is necessary to reduce the memory size and the…
Processing in-memory (PIM) is promising to accelerate neural networks (NNs) because it minimizes data movement and provides large computational parallelism. Similar to machine learning accelerators, application mapping, which determines the…
Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years. Increasingly sophisticated hardware accelerators are proposed that exploit e.g. the sparsity in…
Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature…
Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive…
Deformable convolutional networks have demonstrated outstanding performance in object recognition tasks with an effective feature extraction. Unlike standard convolution, the deformable convolution decides the receptive field size using…
The deep learning revolution brought us an extensive array of neural network architectures that achieve state-of-the-art performance in a wide variety of Computer Vision tasks including among others, classification, detection and…
After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained…
Most of the existing work on FPGA acceleration of Convolutional Neural Network (CNN) focus on employing a single strategy (algorithm, dataflow, etc.) across all the layers. Such an approach does not achieve optimal latency on complex and…
Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and mapping (i.e.,…
Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an…
Photonic technologies have shown a promising way to build high-speed and high-energy-efficiency neural network accelerators. In previously presented photonic neural networks, architectures are mainly designed for fully-connected layers.…
Recent advances in deep learning have improved 3D point cloud registration but increased graphics processing unit (GPU) memory usage, often requiring preliminary sampling that reduces accuracy. We propose an overlapping region sampling…
Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even…
The scaling of neural networks with increasing data and model sizes necessitates the development of more efficient deep learning algorithms. A significant challenge in neural network training is the memory footprint associated with…
Convolutional neural network (CNN) accelerators are being widely used for their efficiency, but they require a large amount of memory, leading to the use of a slow and power consuming external memory. This paper exploits two schemes to…
Training convolutional neural network models is memory intensive since back-propagation requires storing activations of all intermediate layers. This presents a practical concern when seeking to deploy very deep architectures in production,…
To achieve high accuracy, convolutional neural networks (CNNs) are increasingly growing in complexity and diversity in layer types and topologies. This makes it very challenging to efficiently deploy such networks on custom processor…
The excellent performance of deep neural networks has enabled us to solve several automatization problems, opening an era of autonomous devices. However, current deep net architectures are heavy with millions of parameters and require…
The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements. Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers…