Related papers: Improving Memory Utilization in Convolutional Neur…

Condensation-Net: Memory-Efficient Network Architecture with Cross-Channel Pooling Layers and Virtual Feature Maps

"Lightweight convolutional neural networks" is an important research topic in the field of embedded vision. To implement image recognition tasks on a resource-limited hardware platform, it is necessary to reduce the memory size and the…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Tse-Wei Chen , Motoki Yoshinaga , Hongxing Gao , Wei Tao , Dongchao Wen , Junjie Liu , Kinya Osa , Masami Kato

Fast-OverlaPIM: A Fast Overlap-driven Mapping Framework for Processing In-Memory Neural Network Acceleration

Processing in-memory (PIM) is promising to accelerate neural networks (NNs) because it minimizes data movement and provides large computational parallelism. Similar to machine learning accelerators, application mapping, which determines the…

Hardware Architecture · Computer Science 2024-07-02 Xuan Wang , Minxuan Zhou , Tajana Rosing

Dataflow Aware Mapping of Convolutional Neural Networks Onto Many-Core Platforms With Network-on-Chip Interconnect

Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years. Increasingly sophisticated hardware accelerators are proposed that exploit e.g. the sparsity in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-23 Andreas Bytyn , René Ahlsdorf , Rainer Leupers , Gerd Ascheid

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature…

Hardware Architecture · Computer Science 2021-10-13 Zhuang Shao , Xiaoliang Chen , Li Du , Lei Chen , Yuan Du , Wei Zhuang , Huadong Wei , Chenjia Xie , Zhongfeng Wang

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs

Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-13 Chao Li , Yi Yang , Min Feng , Srimat Chakradhar , Huiyang Zhou

An Efficient Accelerator Design Methodology for Deformable Convolutional Networks

Deformable convolutional networks have demonstrated outstanding performance in object recognition tasks with an effective feature extraction. Unlike standard convolution, the deformable convolution decides the receptive field size using…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-16 Saehyun Ahn , Jung-Woo Chang , Suk-Ju Kang

Accelerating Convolutional Neural Networks via Activation Map Compression

The deep learning revolution brought us an extensive array of neural network architectures that achieve state-of-the-art performance in a wide variety of Computer Vision tasks including among others, classification, detection and…

Computer Vision and Pattern Recognition · Computer Science 2019-03-28 Georgios Georgiadis

Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained…

Computer Vision and Pattern Recognition · Computer Science 2019-03-05 Lukas Cavigelli , Luca Benini

DYNAMAP: Dynamic Algorithm Mapping Framework for Low Latency CNN Inference

Most of the existing work on FPGA acceleration of Convolutional Neural Network (CNN) focus on employing a single strategy (algorithm, dataflow, etc.) across all the layers. Such an approach does not achieve optimal latency on complex and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-16 Yuan Meng , Sanmukh Kuppannagari , Rajgopal Kannan , Viktor Prasanna

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators

Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and mapping (i.e.,…

Hardware Architecture · Computer Science 2025-07-23 Jan Klhufek , Miroslav Safar , Vojtech Mrazek , Zdenek Vasicek , Lukas Sekanina

High Performance Convolution Using Sparsity and Patterns for Inference in Deep Convolutional Neural Networks

Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Hossam Amer , Ahmed H. Salamah , Ahmad Sajedi , En-hui Yang

Performance evaluation of an integrated photonic convolutional neural network based on delay buffering and wavelength division multiplexing

Photonic technologies have shown a promising way to build high-speed and high-energy-efficiency neural network accelerators. In previously presented photonic neural networks, architectures are mainly designed for fully-connected layers.…

Signal Processing · Electrical Eng. & Systems 2020-03-02 Shaofu Xu , Jing Wang , Weiwen Zou

Memory-Efficient Point Cloud Registration via Overlapping Region Sampling

Recent advances in deep learning have improved 3D point cloud registration but increased graphics processing unit (GPU) memory usage, often requiring preliminary sampling that reduces accuracy. We propose an overlapping region sampling…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Tomoyasu Shimada , Kazuhiko Murasaki , Shogo Sato , Toshihiko Nishimura , Taiga Yoshida , Ryuichi Tanida

Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs

Convolution is a fundamental operation in many applications, such as computer vision, natural language processing, image processing, etc. Recent successes of convolutional neural networks in various deep learning applications put even…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-31 Xiaoming Chen , Jianxu Chen , Danny Z. Chen , Xiaobo Sharon Hu

Inverted Activations: Reducing Memory Footprint in Neural Network Training

The scaling of neural networks with increasing data and model sizes necessitates the development of more efficient deep learning algorithms. A significant challenge in neural network training is the memory footprint associated with…

Machine Learning · Computer Science 2024-10-08 Georgii Novikov , Ivan Oseledets

AoCStream: All-on-Chip CNN Accelerator With Stream-Based Line-Buffer Architecture

Convolutional neural network (CNN) accelerators are being widely used for their efficiency, but they require a large amount of memory, leading to the use of a slow and power consuming external memory. This paper exploits two schemes to…

Hardware Architecture · Computer Science 2022-12-23 Hyeong-Ju Kang

Backprop with Approximate Activations for Memory-efficient Network Training

Training convolutional neural network models is memory intensive since back-propagation requires storing activations of all intermediate layers. This presents a practical concern when seeking to deploy very deep architectures in production,…

Machine Learning · Computer Science 2019-10-30 Ayan Chakrabarti , Benjamin Moseley

COAC: Cross-layer Optimization of Accelerator Configurability for Efficient CNN Processing

To achieve high accuracy, convolutional neural networks (CNNs) are increasingly growing in complexity and diversity in layer types and topologies. This makes it very challenging to efficiently deploy such networks on custom processor…

Systems and Control · Electrical Eng. & Systems 2024-06-21 Steven Colleman , Man Shi , Marian Verhelst

Improving Efficiency in Convolutional Neural Network with Multilinear Filters

The excellent performance of deep neural networks has enabled us to solve several automatization problems, opening an era of autonomous devices. However, current deep net architectures are heavy with millions of parameters and require…

Computer Vision and Pattern Recognition · Computer Science 2018-07-06 Dat Thanh Tran , Alexandros Iosifidis , Moncef Gabbouj

Less Memory Means smaller GPUs: Backpropagation with Compressed Activations

The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements. Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers…

Machine Learning · Computer Science 2024-09-19 Daniel Barley , Holger Fröning