Related papers: Backprop with Approximate Activations for Memory-e…

Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation

Fine-tuning pretrained large models to downstream tasks is an important problem, which however suffers from huge memory overhead due to large-scale parameters. This work strives to reduce memory overhead in fine-tuning from perspectives of…

Machine Learning · Computer Science 2024-06-25 Yuchen Yang , Yingdong Shi , Cheems Wang , Xiantong Zhen , Yuxuan Shi , Jun Xu

Proximal Backpropagation

We propose proximal backpropagation (ProxProp) as a novel algorithm that takes implicit instead of explicit gradient steps to update the network parameters during neural network training. Our algorithm is motivated by the step size…

Machine Learning · Computer Science 2018-02-21 Thomas Frerix , Thomas Möllenhoff , Michael Moeller , Daniel Cremers

Mesa: A Memory-saving Training Framework for Transformers

There has been an explosion of interest in designing high-performance Transformers. While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all…

Computer Vision and Pattern Recognition · Computer Science 2022-08-30 Zizheng Pan , Peng Chen , Haoyu He , Jing Liu , Jianfei Cai , Bohan Zhuang

Training Neural Networks Using Features Replay

Training a neural network using backpropagation algorithm requires passing error gradients sequentially through the network. The backward locking prevents us from updating network layers in parallel and fully leveraging the computing…

Machine Learning · Computer Science 2019-05-30 Zhouyuan Huo , Bin Gu , Heng Huang

A Hybrid Method for Training Convolutional Neural Networks

Artificial Intelligence algorithms have been steadily increasing in popularity and usage. Deep Learning, allows neural networks to be trained using huge datasets and also removes the need for human extracted features, as it automates the…

Neural and Evolutionary Computing · Computer Science 2020-05-11 Vasco Lopes , Paulo Fazendeiro

Memory-Efficient Backpropagation through Large Linear Layers

In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the…

Machine Learning · Computer Science 2022-02-04 Daniel Bershatsky , Aleksandr Mikhalev , Alexandr Katrutsa , Julia Gusak , Daniil Merkulov , Ivan Oseledets

Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain

The backpropagation of error algorithm (backprop) has been instrumental in the recent success of deep learning. However, a key question remains as to whether backprop can be formulated in a manner suitable for implementation in neural…

Neural and Evolutionary Computing · Computer Science 2020-10-13 Beren Millidge , Alexander Tschantz , Anil K Seth , Christopher L Buckley

Less Memory Means smaller GPUs: Backpropagation with Compressed Activations

The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements. Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers…

Machine Learning · Computer Science 2024-09-19 Daniel Barley , Holger Fröning

Memory-Efficient Backpropagation Through Time

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of…

Neural and Evolutionary Computing · Computer Science 2016-06-13 Audrūnas Gruslys , Remi Munos , Ivo Danihelka , Marc Lanctot , Alex Graves

Contrastive Forward-Forward: A Training Algorithm of Vision Transformer

Although backpropagation is widely accepted as a training algorithm for artificial neural networks, researchers are always looking for inspiration from the brain to find ways with potentially better performance. Forward-Forward is a novel…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Hossein Aghagolzadeh , Mehdi Ezoji

Reverse Back Propagation to Make Full Use of Derivative

The development of the back-propagation algorithm represents a landmark in neural networks. We provide an approach that conducts the back-propagation again to reverse the traditional back-propagation process to optimize the input loss at…

Machine Learning · Computer Science 2022-02-15 Weiming Xiong , Ruoyu Yang

Backpropagation for long sequences: beyond memory constraints with constant overheads

Naive backpropagation through time has a memory footprint that grows linearly in the sequence length, due to the need to store each state of the forward propagation. This is a problem for large networks. Strategies have been developed to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-05 Navjot Kukreja , Jan Hückelheim , Gerard J. Gorman

Decoupled Parallel Backpropagation with Convergence Guarantee

Backpropagation algorithm is indispensable for the training of feedforward neural networks. It requires propagating error gradients sequentially from the output layer all the way back to the input layer. The backward locking in…

Machine Learning · Computer Science 2018-07-24 Zhouyuan Huo , Bin Gu , Qian Yang , Heng Huang

PaReprop: Fast Parallelized Reversible Backpropagation

The growing size of datasets and deep learning models has made faster and memory-efficient training crucial. Reversible transformers have recently been introduced as an exciting new method for extremely memory-efficient training, but they…

Machine Learning · Computer Science 2023-06-16 Tyler Zhu , Karttikeya Mangalam

Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models

We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos. It is based on the finding that gradients from incomplete execution for backpropagation can still effectively train…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Feng Cheng , Mingze Xu , Yuanjun Xiong , Hao Chen , Xinyu Li , Wei Li , Wei Xia

Dithered backprop: A sparse and quantized backpropagation algorithm for more efficient deep neural network training

Deep Neural Networks are successful but highly computationally expensive learning systems. One of the main sources of time and energy drains is the well known backpropagation (backprop) algorithm, which roughly accounts for 2/3 of the…

Machine Learning · Computer Science 2020-04-17 Simon Wiedemann , Temesgen Mehari , Kevin Kepp , Wojciech Samek

Accelerating Deep Learning by Focusing on the Biggest Losers

This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward…

Machine Learning · Computer Science 2019-10-03 Angela H. Jiang , Daniel L. -K. Wong , Giulio Zhou , David G. Andersen , Jeffrey Dean , Gregory R. Ganger , Gauri Joshi , Michael Kaminksy , Michael Kozuch , Zachary C. Lipton , Padmanabhan Pillai

Reversible designs for extreme memory cost reduction of CNN training

Training Convolutional Neural Networks (CNN) is a resource intensive task that requires specialized hardware for efficient computation. One of the most limiting bottleneck of CNN training is the memory cost associated with storing the…

Computer Vision and Pattern Recognition · Computer Science 2019-10-25 Tristan Hascoet , Quentin Febvre , Yasuo Ariki , Tetsuya Takiguchi

Minimal Effort Back Propagation for Convolutional Neural Networks

As traditional neural network consumes a significant amount of computing resources during back propagation, \citet{Sun2017mePropSB} propose a simple yet effective technique to alleviate this problem. In this technique, only a small subset…

Machine Learning · Computer Science 2017-09-27 Bingzhen Wei , Xu Sun , Xuancheng Ren , Jingjing Xu

Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory

This paper introduces a new activation checkpointing method which allows to significantly decrease memory usage when training Deep Neural Networks with the back-propagation algorithm. Similarly to checkpoint-ing techniques coming from the…

Machine Learning · Computer Science 2019-12-02 Julien Herrmann , Olivier Beaumont , Lionel Eyraud-Dubois , Julien Hermann , Alexis Joly , Alena Shilova