Related papers: Blockwise Self-Supervised Learning at Scale

Block-wise Training of Residual Networks via the Minimizing Movement Scheme

End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward…

Machine Learning · Computer Science 2023-06-07 Skander Karkar , Ibrahim Ayed , Emmanuel de Bézenac , Patrick Gallinari

Block-local learning with probabilistic latent representations

The ubiquitous backpropagation algorithm requires sequential updates through the network introducing a locking problem. In addition, back-propagation relies on the transpose of forward weight matrices to compute updates, introducing a…

Machine Learning · Computer Science 2023-10-31 David Kappel , Khaleelulla Khan Nazeer , Cabrel Teguemne Fokam , Christian Mayr , Anand Subramoney

Big Self-Supervised Models are Strong Semi-Supervised Learners

One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way,…

Machine Learning · Computer Science 2020-10-27 Ting Chen , Simon Kornblith , Kevin Swersky , Mohammad Norouzi , Geoffrey Hinton

Depth-Wise Representation Development Under Blockwise Self-Supervised Learning for Video Vision Transformers

End-to-end backpropagation couples all layers through a global error signal, enabling coordinated learning but requiring long-range credit assignment. Motivated by recent progress in blockwise self-supervised learning (BWSSL), we ask…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Jonas Römer , Timo Dickscheid

BackLink: Supervised Local Training with Backward Links

Empowered by the backpropagation (BP) algorithm, deep neural networks have dominated the race in solving various cognitive tasks. The restricted training pattern in the standard BP requires end-to-end error propagation, causing large memory…

Machine Learning · Computer Science 2022-05-17 Wenzhe Guo , Mohammed E Fouda , Ahmed M. Eltawil , Khaled N. Salama

On the Stepwise Nature of Self-Supervised Learning

We present a simple picture of the training process of joint embedding self-supervised learning methods. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated…

Machine Learning · Computer Science 2023-05-31 James B. Simon , Maksis Knutins , Liu Ziyin , Daniel Geisz , Abraham J. Fetterman , Joshua Albrecht

Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks

Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do…

Machine Learning · Computer Science 2023-12-22 Anzhe Cheng , Zhenkun Wang , Chenzhong Yin , Mingxi Cheng , Heng Ping , Xiongye Xiao , Shahin Nazarian , Paul Bogdan

Self-Supervised Training Enhances Online Continual Learning

In continual learning, a system must incrementally learn from a non-stationary data stream without catastrophic forgetting. Recently, multiple methods have been devised for incrementally learning classes on large-scale image classification…

Computer Vision and Pattern Recognition · Computer Science 2021-10-25 Jhair Gallardo , Tyler L. Hayes , Christopher Kanan

Training Neural Networks with Local Error Signals

Supervised training of neural networks for classification is typically performed with a global loss function. The loss function provides a gradient for the output layer, and this gradient is back-propagated to hidden layers to dictate an…

Machine Learning · Statistics 2019-05-09 Arild Nøkland , Lars Hiller Eidnes

Improving Reliability of Fine-tuning with Block-wise Optimisation

Finetuning can be used to tackle domain-specific tasks by transferring knowledge. Previous studies on finetuning focused on adapting only the weights of a task-specific classifier or re-optimizing all layers of the pre-trained model using…

Machine Learning · Computer Science 2023-01-18 Basel Barakat , Qiang Huang

Deep supervised learning using local errors

Error backpropagation is a highly effective mechanism for learning high-quality hierarchical features in deep networks. Updating the features or weights in one layer, however, requires waiting for the propagation of error signals from…

Neural and Evolutionary Computing · Computer Science 2017-11-21 Hesham Mostafa , Vishwajith Ramesh , Gert Cauwenberghs

Alpha-Net: Architecture, Models, and Applications

Deep learning network training is usually computationally expensive and intuitively complex. We present a novel network architecture for custom training and weight evaluations. We reformulate the layers as ResNet-similar blocks with certain…

Computer Vision and Pattern Recognition · Computer Science 2020-07-15 Jishan Shaikh , Adya Sharma , Ankit Chouhan , Avinash Mahawar

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Scaling up self-supervised learning has driven breakthroughs in language and vision, yet comparable progress has remained elusive in reinforcement learning (RL). In this paper, we study building blocks for self-supervised RL that unlock…

Machine Learning · Computer Science 2026-02-03 Kevin Wang , Ishaan Javali , Michał Bortkiewicz , Tomasz Trzciński , Benjamin Eysenbach

Stochastic Layer-wise Learning: Scalable and Efficient Alternative to Backpropagation

Backpropagation underpins modern deep learning, yet its reliance on global gradient synchronization limits scalability and incurs high memory costs. In contrast, fully local learning rules are more efficient but often struggle to maintain…

Machine Learning · Computer Science 2025-10-01 Bojian Yin , Federico Corradi

RRR-Net: Reusing, Reducing, and Recycling a Deep Backbone Network

It has become mainstream in computer vision and other machine learning domains to reuse backbone networks pre-trained on large datasets as preprocessors. Typically, the last layer is replaced by a shallow learning machine of sorts; the…

Machine Learning · Computer Science 2023-10-03 Haozhe Sun , Isabelle Guyon , Felix Mohr , Hedi Tabia

Why Layer-Wise Learning is Hard to Scale-up and a Possible Solution via Accelerated Downsampling

Layer-wise learning, as an alternative to global back-propagation, is easy to interpret, analyze, and it is memory efficient. Recent studies demonstrate that layer-wise learning can achieve state-of-the-art performance in image…

Computer Vision and Pattern Recognition · Computer Science 2020-10-19 Wenchi Ma , Miao Yu , Kaidong Li , Guanghui Wang

Local Critic Training of Deep Neural Networks

This paper proposes a novel approach to train deep neural networks by unlocking the layer-wise dependency of backpropagation training. The approach employs additional modules called local critic networks besides the main network model to be…

Machine Learning · Computer Science 2018-09-28 Hojung Lee , Jong-seok Lee

Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment

Training deep neural networks on large-scale datasets requires significant hardware resources whose costs (even on cloud platforms) put them out of reach of smaller organizations, groups, and individuals. Backpropagation, the workhorse for…

Machine Learning · Computer Science 2020-09-22 Alexander Ororbia , Ankur Mali , Daniel Kifer , C. Lee Giles

Towards Scaling Deep Neural Networks with Predictive Coding: Theory and Practice

Backpropagation (BP) is the standard algorithm for training the deep neural networks that power modern artificial intelligence including large language models. However, BP is energy inefficient and unlikely to be implemented by the brain.…

Machine Learning · Computer Science 2025-10-30 Francesco Innocenti

Blockwise Parallel Decoding for Deep Autoregressive Models

Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make…

Machine Learning · Computer Science 2018-11-09 Mitchell Stern , Noam Shazeer , Jakob Uszkoreit