Related papers: Layerwise Optimization by Gradient Decomposition f…

Continual Learning with Scaled Gradient Projection

In neural networks, continual learning results in gradient interference among sequential tasks, leading to catastrophic forgetting of old tasks while learning new ones. This issue is addressed in recent methods by storing the important…

Machine Learning · Computer Science 2023-02-06 Gobinda Saha , Kaushik Roy

Orthogonal Gradient Descent for Continual Learning

Neural networks are achieving state of the art and sometimes super-human performance on learning tasks across a variety of domains. Whenever these problems require learning in a continual or sequential manner, however, neural networks…

Machine Learning · Computer Science 2019-10-17 Mehrdad Farajtabar , Navid Azizan , Alex Mott , Ang Li

Gradient Projection Memory for Continual Learning

The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. Existing approaches to enable such learning in artificial neural networks usually rely on network growth, importance…

Machine Learning · Computer Science 2021-03-18 Gobinda Saha , Isha Garg , Kaushik Roy

GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning

Multi-task learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several…

Machine Learning · Computer Science 2023-02-01 Xin Dong , Ruize Wu , Chao Xiong , Hai Li , Lei Cheng , Yong He , Shiyou Qian , Jian Cao , Linjian Mo

Multi-Grade Deep Learning

The current deep learning model is of a single-grade, that is, it learns a deep neural network by solving a single nonconvex optimization problem. When the layer number of the neural network is large, it is computationally challenging to…

Machine Learning · Computer Science 2023-02-02 Yuesheng Xu

Meta-Learning with Warped Gradient Descent

Learning an efficient update rule from data that promotes rapid learning of new tasks from the same distribution remains an open problem in meta-learning. Typically, previous works have approached this issue either by attempting to train a…

Machine Learning · Computer Science 2020-02-19 Sebastian Flennerhag , Andrei A. Rusu , Razvan Pascanu , Francesco Visin , Hujun Yin , Raia Hadsell

Multitask Learning with Single Gradient Step Update for Task Balancing

Multitask learning is a methodology to boost generalization performance and also reduce computational intensity and memory usage. However, learning multiple tasks simultaneously can be more difficult than learning a single task because it…

Machine Learning · Computer Science 2020-06-03 Sungjae Lee , Youngdoo Son

Learning Gradient Descent: Better Generalization and Longer Horizons

Training deep neural networks is a highly nontrivial task, involving carefully selecting appropriate training algorithms, scheduling step sizes and tuning other hyperparameters. Trying different combinations can be quite labor-intensive and…

Machine Learning · Computer Science 2017-06-13 Kaifeng Lv , Shunhua Jiang , Jian Li

Collaborative Layer-wise Discriminative Learning in Deep Neural Networks

Intermediate features at different layers of a deep neural network are known to be discriminative for visual patterns of different complexities. However, most existing works ignore such cross-layer heterogeneities when classifying samples…

Computer Vision and Pattern Recognition · Computer Science 2016-07-20 Xiaojie Jin , Yunpeng Chen , Jian Dong , Jiashi Feng , Shuicheng Yan

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models

We propose a new technique that boosts the convergence of training generative adversarial networks. Generally, the rate of training deep models reduces severely after multiple iterations. A key reason for this phenomenon is that a deep…

Machine Learning · Statistics 2018-06-15 Atsushi Nitanda , Taiji Suzuki

Regularizing Deep Multi-Task Networks using Orthogonal Gradients

Deep neural networks are a promising approach towards multi-task learning because of their capability to leverage knowledge across domains and learn general purpose representations. Nevertheless, they can fail to live up to these promises…

Machine Learning · Computer Science 2019-12-17 Mihai Suteu , Yike Guo

Delving into Effective Gradient Matching for Dataset Condensation

As deep learning models and datasets rapidly scale up, network training is extremely time-consuming and resource-costly. Instead of training on the entire dataset, learning with a small synthetic dataset becomes an efficient solution.…

Machine Learning · Computer Science 2022-08-02 Zixuan Jiang , Jiaqi Gu , Mingjie Liu , David Z. Pan

Gradient Normalization & Depth Based Decay For Deep Learning

In this paper we introduce a novel method of gradient normalization and decay with respect to depth. Our method leverages the simple concept of normalizing all gradients in a deep neural network, and then decaying said gradients with…

Machine Learning · Computer Science 2018-03-01 Robert Kwiatkowski , Oscar Chang

Balanced Gradient Sample Retrieval for Enhanced Knowledge Retention in Proxy-based Continual Learning

Continual learning in deep neural networks often suffers from catastrophic forgetting, where representations for previous tasks are overwritten during subsequent training. We propose a novel sample retrieval strategy from the memory buffer…

Machine Learning · Computer Science 2024-12-20 Hongye Xu , Jan Wasilewski , Bartosz Krawczyk

Gradient-free Continual Learning

Continual learning (CL) presents a fundamental challenge in training neural networks on sequential tasks without experiencing catastrophic forgetting. Traditionally, the dominant approach in CL has been gradient-based optimization, where…

Machine Learning · Computer Science 2025-04-03 Grzegorz Rypeść

Towards Differentiable Multilevel Optimization: A Gradient-Based Approach

Multilevel optimization has gained renewed interest in machine learning due to its promise in applications such as hyperparameter tuning and continual learning. However, existing methods struggle with the inherent difficulty of efficiently…

Machine Learning · Computer Science 2024-10-16 Yuntian Gu , Xuzheng Chen

Topological Gradient-based Competitive Learning

Topological learning is a wide research area aiming at uncovering the mutual spatial relationships between the elements of a set. Some of the most common and oldest approaches involve the use of unsupervised competitive neural networks.…

Machine Learning · Statistics 2021-11-03 Pietro Barbiero , Gabriele Ciravegna , Vincenzo Randazzo , Giansalvo Cirrincione

Learned Weight Sharing for Deep Multi-Task Learning by Natural Evolution Strategy and Stochastic Gradient Descent

In deep multi-task learning, weights of task-specific networks are shared between tasks to improve performance on each single one. Since the question, which weights to share between layers, is difficult to answer, human-designed…

Machine Learning · Computer Science 2020-03-24 Jonas Prellberg , Oliver Kramer

Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning

Multilingual models jointly pretrained on multiple languages have achieved remarkable performance on various multilingual downstream tasks. Moreover, models finetuned on a single monolingual downstream task have shown to generalize to…

Computation and Language · Computer Science 2022-03-01 Seanie Lee , Hae Beom Lee , Juho Lee , Sung Ju Hwang

Learned Gradient Compression for Distributed Deep Learning

Training deep neural networks on large datasets containing high-dimensional data requires a large amount of computation. A solution to this problem is data-parallel distributed training, where a model is replicated into several…

Machine Learning · Computer Science 2021-03-18 Lusine Abrahamyan , Yiming Chen , Giannis Bekoulis , Nikos Deligiannis