Related papers: Mini-batch Serialization: CNN Training with Inter-…

Restructuring Batch Normalization to Accelerate CNN Training

Batch Normalization (BN) has become a core design block of modern Convolutional Neural Networks (CNNs). A typical modern CNN has a large number of BN layers in its lean and deep architecture. BN requires mean and variance calculations over…

Computer Vision and Pattern Recognition · Computer Science 2019-03-04 Wonkyung Jung , Daejin Jung , and Byeongho Kim , Sunjung Lee , Wonjong Rhee , Jung Ho Ahn

Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism

Scaling CNN training is necessary to keep up with growing datasets and reduce training time. We also see an emerging need to handle datasets with very large samples, where memory requirements for training are large. Existing training…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-18 Nikoli Dryden , Naoya Maruyama , Tom Benson , Tim Moon , Marc Snir , Brian Van Essen

Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping

The design complexity of CNNs has been steadily increasing to improve accuracy. To cope with the massive amount of computation needed for such complex CNNs, the latest solutions utilize blocking of an image over the available dimensions and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-19 Daejin Jung , Sunjung Lee , Wonjong Rhee , Jung Ho Ahn

Training CNNs faster with Dynamic Input and Kernel Downsampling

We reduce training time in convolutional networks (CNNs) with a method that, for some of the mini-batches: a) scales down the resolution of input images via downsampling, and b) reduces the forward pass operations via pooling on the…

Machine Learning · Computer Science 2019-10-16 Zissis Poulos , Ali Nouri , Andreas Moshovos

Accelerated Training for CNN Distributed Deep Learning through Automatic Resource-Aware Layer Placement

The Convolutional Neural Network (CNN) model, often used for image classification, requires significant training time to obtain high accuracy. To this end, distributed training is performed with the parameter server (PS) architecture using…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-18 Jay H. Park , Sunghwan Kim , Jinwon Lee , Myeongjae Jeon , Sam H. Noh

MBS: Macroblock Scaling for CNN Model Reduction

In this paper we propose the macroblock scaling (MBS) algorithm, which can be applied to various CNN architectures to reduce their model size. MBS adaptively reduces each CNN macroblock depending on its information redundancy measured by…

Machine Learning · Computer Science 2019-04-16 Yu-Hsun Lin , Chun-Nan Chou , Edward Y. Chang

LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction

In the last decade, Convolutional Neural Network with a multi-layer architecture has advanced rapidly. However, training its complex network is very space-consuming, since a lot of intermediate data are preserved across layers, especially…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-23 Zhigang Wang , Hangyu Yang , Ning Wang , Chuanfei Xu , Jie Nie , Zhiqiang Wei , Yu Gu , Ge Yu

Learning Efficient Convolutional Networks through Network Slimming

The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the…

Computer Vision and Pattern Recognition · Computer Science 2017-08-23 Zhuang Liu , Jianguo Li , Zhiqiang Shen , Gao Huang , Shoumeng Yan , Changshui Zhang

Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

Batch Normalization (BN) has become an out-of-box technique to improve deep network training. However, its effectiveness is limited for micro-batch training, i.e., each GPU typically has only 1-2 images for training, which is inevitable for…

Computer Vision and Pattern Recognition · Computer Science 2020-08-11 Siyuan Qiao , Huiyu Wang , Chenxi Liu , Wei Shen , Alan Yuille

Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are one of the most successful deep machine learning technologies for processing image, voice and video data. CNNs require large amounts of processing capacity and memory, which can exceed the resources…

Neural and Evolutionary Computing · Computer Science 2017-08-17 James Garland , David Gregg

Dynamic Normalization

Batch Normalization has become one of the essential components in CNN. It allows the network to use a higher learning rate and speed up training. And the network doesn't need to be initialized carefully. However, in our work, we find that a…

Computer Vision and Pattern Recognition · Computer Science 2021-01-18 Chuan Liu , Yi Gao , Jiancheng Lv

Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks

In this paper, we introduce a memory-efficient CNN (convolutional neural network), which enables resource-constrained low-end embedded and IoT devices to perform on-device vision tasks, such as image classification and object detection,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Jaewook Lee , Yoel Park , Seulki Lee

Resource-efficient Deep Neural Networks for Automotive Radar Interference Mitigation

Radar sensors are crucial for environment perception of driver assistance systems as well as autonomous vehicles. With a rising number of radar sensors and the so far unregulated automotive radar frequency band, mutual interference is…

Signal Processing · Electrical Eng. & Systems 2022-01-26 Johanna Rock , Wolfgang Roth , Mate Toth , Paul Meissner , Franz Pernkopf

AoCStream: All-on-Chip CNN Accelerator With Stream-Based Line-Buffer Architecture

Convolutional neural network (CNN) accelerators are being widely used for their efficiency, but they require a large amount of memory, leading to the use of a slow and power consuming external memory. This paper exploits two schemes to…

Hardware Architecture · Computer Science 2022-12-23 Hyeong-Ju Kang

Applications of Sequential Learning for Medical Image Classification

Purpose: The aim of this work is to develop a neural network training framework for continual training of small amounts of medical imaging data and create heuristics to assess training in the absence of a hold-out validation or test set.…

Image and Video Processing · Electrical Eng. & Systems 2023-09-27 Sohaib Naim , Brian Caffo , Haris I Sair , Craig K Jones

An Energy-Efficient Accelerator Architecture with Serial Accumulation Dataflow for Deep CNNs

Convolutional Neural Networks (CNNs) have shown outstanding accuracy for many vision tasks during recent years. When deploying CNNs on portable devices and embedded systems, however, the large number of parameters and computations result in…

Signal Processing · Electrical Eng. & Systems 2020-02-19 Mehdi Ahmadi , Shervin Vakili , J. M. Pierre Langlois

Enhanced CNN for image denoising

Owing to flexible architectures of deep convolutional neural networks (CNNs), CNNs are successfully used for image denoising. However, they suffer from the following drawbacks: (i) deep network architecture is very difficult to train. (ii)…

Computer Vision and Pattern Recognition · Computer Science 2019-03-05 Chunwei Tian , Yong Xu , Lunke Fei , Junqian Wang , Jie Wen , Nan Luo

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators

Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and mapping (i.e.,…

Hardware Architecture · Computer Science 2025-07-23 Jan Klhufek , Miroslav Safar , Vojtech Mrazek , Zdenek Vasicek , Lukas Sekanina

From Hashing to CNNs: Training BinaryWeight Networks via Hashing

Deep convolutional neural networks (CNNs) have shown appealing performance on various computer vision tasks in recent years. This motivates people to deploy CNNs to realworld applications. However, most of state-of-art CNNs require large…

Computer Vision and Pattern Recognition · Computer Science 2018-02-09 Qinghao Hu , Peisong Wang , Jian Cheng

Training Multiscale-CNN for Large Microscopy Image Classification in One Hour

Existing approaches to train neural networks that use large images require to either crop or down-sample data during pre-processing, use small batch sizes, or split the model across devices mainly due to the prohibitively limited memory…

Image and Video Processing · Electrical Eng. & Systems 2020-03-12 Kushal Datta , Imtiaz Hossain , Sun Choi , Vikram Saletore , Kyle Ambert , William J. Godinez , Xian Zhang