English
Related papers

Related papers: Backprop with Approximate Activations for Memory-e…

200 papers

Fine-tuning pretrained large models to downstream tasks is an important problem, which however suffers from huge memory overhead due to large-scale parameters. This work strives to reduce memory overhead in fine-tuning from perspectives of…

Machine Learning · Computer Science 2024-06-25 Yuchen Yang , Yingdong Shi , Cheems Wang , Xiantong Zhen , Yuxuan Shi , Jun Xu

We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size…

Machine Learning · Computer Science 2018-02-21 Thomas Frerix , Thomas Möllenhoff , Michael Moeller , Daniel Cremers

There has been an explosion of interest in designing high-performance Transformers. While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all…

Computer Vision and Pattern Recognition · Computer Science 2022-08-30 Zizheng Pan , Peng Chen , Haoyu He , Jing Liu , Jianfei Cai , Bohan Zhuang

Training a neural network using backpropagation algorithm requires passing error gradients sequentially through the network. The backward locking prevents us from updating network layers in parallel and fully leveraging the computing…

Machine Learning · Computer Science 2019-05-30 Zhouyuan Huo , Bin Gu , Heng Huang

Artificial Intelligence algorithms have been steadily increasing in popularity and usage. Deep Learning, allows neural networks to be trained using huge datasets and also removes the need for human extracted features, as it automates the…

Neural and Evolutionary Computing · Computer Science 2020-05-11 Vasco Lopes , Paulo Fazendeiro

In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the…

Machine Learning · Computer Science 2022-02-04 Daniel Bershatsky , Aleksandr Mikhalev , Alexandr Katrutsa , Julia Gusak , Daniil Merkulov , Ivan Oseledets

The backpropagation of error algorithm (backprop) has been instrumental in the recent success of deep learning. However, a key question remains as to whether backprop can be formulated in a manner suitable for implementation in neural…

Neural and Evolutionary Computing · Computer Science 2020-10-13 Beren Millidge , Alexander Tschantz , Anil K Seth , Christopher L Buckley

The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements. Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers…

Machine Learning · Computer Science 2024-09-19 Daniel Barley , Holger Fröning

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of…

Neural and Evolutionary Computing · Computer Science 2016-06-13 Audrūnas Gruslys , Remi Munos , Ivo Danihelka , Marc Lanctot , Alex Graves

Although backpropagation is widely accepted as a training algorithm for artificial neural networks, researchers are always looking for inspiration from the brain to find ways with potentially better performance. Forward-Forward is a novel…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Hossein Aghagolzadeh , Mehdi Ezoji

The development of the back-propagation algorithm represents a landmark in neural networks. We provide an approach that conducts the back-propagation again to reverse the traditional back-propagation process to optimize the input loss at…

Machine Learning · Computer Science 2022-02-15 Weiming Xiong , Ruoyu Yang

Naive backpropagation through time has a memory footprint that grows linearly in the sequence length, due to the need to store each state of the forward propagation. This is a problem for large networks. Strategies have been developed to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-05 Navjot Kukreja , Jan Hückelheim , Gerard J. Gorman

Backpropagation algorithm is indispensable for the training of feedforward neural networks. It requires propagating error gradients sequentially from the output layer all the way back to the input layer. The backward locking in…

Machine Learning · Computer Science 2018-07-24 Zhouyuan Huo , Bin Gu , Qian Yang , Heng Huang

The growing size of datasets and deep learning models has made faster and memory-efficient training crucial. Reversible transformers have recently been introduced as an exciting new method for extremely memory-efficient training, but they…

Machine Learning · Computer Science 2023-06-16 Tyler Zhu , Karttikeya Mangalam

We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos. It is based on the finding that gradients from incomplete execution for backpropagation can still effectively train…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Feng Cheng , Mingze Xu , Yuanjun Xiong , Hao Chen , Xinyu Li , Wei Li , Wei Xia

Deep Neural Networks are successful but highly computationally expensive learning systems. One of the main sources of time and energy drains is the well known backpropagation (backprop) algorithm, which roughly accounts for 2/3 of the…

Machine Learning · Computer Science 2020-04-17 Simon Wiedemann , Temesgen Mehari , Kevin Kepp , Wojciech Samek

This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward…

Training Convolutional Neural Networks (CNN) is a resource intensive task that requires specialized hardware for efficient computation. One of the most limiting bottleneck of CNN training is the memory cost associated with storing the…

Computer Vision and Pattern Recognition · Computer Science 2019-10-25 Tristan Hascoet , Quentin Febvre , Yasuo Ariki , Tetsuya Takiguchi

As traditional neural network consumes a significant amount of computing resources during back propagation, \citet{Sun2017mePropSB} propose a simple yet effective technique to alleviate this problem. In this technique, only a small subset…

Machine Learning · Computer Science 2017-09-27 Bingzhen Wei , Xu Sun , Xuancheng Ren , Jingjing Xu

This paper introduces a new activation checkpointing method which allows to significantly decrease memory usage when training Deep Neural Networks with the back-propagation algorithm. Similarly to checkpoint-ing techniques coming from the…

Machine Learning · Computer Science 2019-12-02 Julien Herrmann , Olivier Beaumont , Lionel Eyraud-Dubois , Julien Hermann , Alexis Joly , Alena Shilova
‹ Prev 1 2 3 10 Next ›