Related papers: Accelerated CNN Training Through Gradient Approxim…

Accelerating Very Deep Convolutional Networks for Classification and Detection

This paper aims to accelerate the test-time computation of convolutional neural networks (CNNs), especially very deep CNNs that have substantially impacted the computer vision community. Unlike previous methods that are designed for…

Computer Vision and Pattern Recognition · Computer Science 2015-11-19 Xiangyu Zhang , Jianhua Zou , Kaiming He , Jian Sun

Fast Training of Convolutional Neural Networks via Kernel Rescaling

Training deep Convolutional Neural Networks (CNN) is a time consuming task that may take weeks to complete. In this article we propose a novel, theoretically founded method for reducing CNN training time without incurring any loss in…

Computer Vision and Pattern Recognition · Computer Science 2016-10-13 Pedro Porto Buarque de Gusmão , Gianluca Francini , Skjalg Lepsøy , Enrico Magli

Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes

Synchronized stochastic gradient descent (SGD) optimizers with data parallelism are widely used in training large-scale deep neural networks. Although using larger mini-batch sizes can improve the system scalability by reducing the…

Machine Learning · Computer Science 2018-07-31 Xianyan Jia , Shutao Song , Wei He , Yangzihao Wang , Haidong Rong , Feihu Zhou , Liqiang Xie , Zhenyu Guo , Yuanzhou Yang , Liwei Yu , Tiegang Chen , Guangxiao Hu , Shaohuai Shi , Xiaowen Chu

Efficient and Accurate Approximations of Nonlinear Convolutional Networks

This paper aims to accelerate the test-time computation of deep convolutional neural networks (CNNs). Unlike existing methods that are designed for approximating linear filters or linear responses, our method takes the nonlinear units into…

Computer Vision and Pattern Recognition · Computer Science 2014-11-18 Xiangyu Zhang , Jianhua Zou , Xiang Ming , Kaiming He , Jian Sun

Patch Gradient Descent: Training Neural Networks on Very Large Images

Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an…

Computer Vision and Pattern Recognition · Computer Science 2023-02-01 Deepak K. Gupta , Gowreesh Mago , Arnav Chavan , Dilip K. Prasad

When deep learning models on GPU can be accelerated by taking advantage of unstructured sparsity

This paper is focused on the improvement the efficiency of the sparse convolutional neural networks (CNNs) layers on graphic processing units (GPU). The Nvidia deep neural network (cuDnn) library provides the most effective implementation…

Machine Learning · Computer Science 2022-01-03 Marcin Pietroń , Dominik Żurek

Faster Neural Network Training with Approximate Tensor Operations

We propose a novel technique for faster deep neural network training which systematically applies sample-based approximation to the constituent tensor operations, i.e., matrix multiplications and convolutions. We introduce new sampling…

Machine Learning · Computer Science 2021-10-27 Menachem Adelman , Kfir Y. Levy , Ido Hakimi , Mark Silberstein

Training Neural Networks for Execution on Approximate Hardware

Approximate computing methods have shown great potential for deep learning. Due to the reduced hardware costs, these methods are especially suitable for inference tasks on battery-operated devices that are constrained by their power budget.…

Machine Learning · Computer Science 2023-04-11 Tianmu Li , Shurui Li , Puneet Gupta

ResNet: Enabling Deep Convolutional Neural Networks through Residual Learning

Convolutional Neural Networks (CNNs) has revolutionized computer vision, but training very deep networks has been challenging due to the vanishing gradient problem. This paper explores Residual Networks (ResNet), introduced by He et al.…

Computer Vision and Pattern Recognition · Computer Science 2025-10-29 Xingyu Liu , Kun Ming Goh

Gradient Amplification: An efficient way to train deep neural networks

Improving performance of deep learning models and reducing their training times are ongoing challenges in deep neural networks. There are several approaches proposed to address these challenges one of which is to increase the depth of the…

Machine Learning · Computer Science 2020-06-20 Sunitha Basodi , Chunyan Ji , Haiping Zhang , Yi Pan

Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex…

Computer Vision and Pattern Recognition · Computer Science 2016-05-23 Philipp Gysel

AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Neural Networks

Typically, Ultra-deep neural network(UDNN) tends to yield high-quality model, but its training process is usually resource intensive and time-consuming. Modern GPU's scarce DRAM capacity is the primary bottleneck that hinders the…

Machine Learning · Computer Science 2019-06-21 Jinrong Guo , Wantao Liu , Wang Wang , Qu Lu , Songlin Hu , Jizhong Han , Ruixuan Li

FireCaffe: near-linear acceleration of deep neural network training on compute clusters

Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network…

Computer Vision and Pattern Recognition · Computer Science 2016-01-11 Forrest N. Iandola , Khalid Ashraf , Matthew W. Moskewicz , Kurt Keutzer

A Hitchhiker's Guide On Distributed Training of Deep Neural Networks

Deep learning has led to tremendous advancements in the field of Artificial Intelligence. One caveat however is the substantial amount of compute needed to train these deep learning models. Training a benchmark dataset like ImageNet on a…

Machine Learning · Computer Science 2018-10-30 Karanbir Chahal , Manraj Singh Grover , Kuntal Dey

Accurate, Efficient and Scalable Training of Graph Neural Networks

Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs. When applying deep GNNs on large graphs, it is still challenging to perform training in an efficient and scalable way. We propose a novel…

Machine Learning · Computer Science 2020-10-08 Hanqing Zeng , Hongkuan Zhou , Ajitesh Srivastava , Rajgopal Kannan , Viktor Prasanna

An Adaptive Remote Stochastic Gradient Method for Training Neural Networks

We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures.…

Machine Learning · Computer Science 2020-09-08 Yushu Chen , Hao Jing , Wenlai Zhao , Zhiqiang Liu , Ouyi Li , Liang Qiao , Wei Xue , Guangwen Yang

Defending with Errors: Approximate Computing for Robustness of Deep Neural Networks

Machine-learning architectures, such as Convolutional Neural Networks (CNNs) are vulnerable to adversarial attacks: inputs crafted carefully to force the system output to a wrong label. Since machine-learning is being deployed in…

Cryptography and Security · Computer Science 2022-11-03 Amira Guesmi , Ihsen Alouani , Khaled N. Khasawneh , Mouna Baklouti , Tarek Frikha , Mohamed Abid , Nael Abu-Ghazaleh

Scaling Deep Learning on GPU and Knights Landing clusters

The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-11 Yang You , Aydin Buluc , James Demmel

8-Bit Approximations for Parallelism in Deep Learning

The creation of practical deep learning data-products often requires parallelization across processors and computers to make deep learning feasible on large data sets, but bottlenecks in communication bandwidth make it difficult to attain…

Neural and Evolutionary Computing · Computer Science 2016-02-22 Tim Dettmers

TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU

Energy efficiency of hardware accelerators of deep neural networks (DNN) can be improved by introducing approximate arithmetic circuits. In order to quantify the error introduced by using these circuits and avoid the expensive hardware…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-03 Filip Vaverka , Vojtech Mrazek , Zdenek Vasicek , Lukas Sekanina