Related papers: Gradient-based Hyperparameter Optimization Over Lo…

Distance-Forward Learning: Enhancing the Forward-Forward Algorithm Towards High-Performance On-Chip Learning

The Forward-Forward (FF) algorithm was recently proposed as a local learning method to address the limitations of backpropagation (BP), offering biological plausibility along with memory-efficient and highly parallelized computational…

Neural and Evolutionary Computing · Computer Science 2024-08-28 Yujie Wu , Siyuan Xu , Jibin Wu , Lei Deng , Mingkun Xu , Qinghao Wen , Guoqi Li

Forward and Reverse Gradient-Based Hyperparameter Optimization

We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror…

Machine Learning · Statistics 2017-12-13 Luca Franceschi , Michele Donini , Paolo Frasconi , Massimiliano Pontil

Enhancing Fractional Gradient Descent with Learned Optimizers

Fractional Gradient Descent (FGD) offers a novel and promising way to accelerate optimization by incorporating fractional calculus into machine learning. Although FGD has shown encouraging initial results across various optimization tasks,…

Machine Learning · Computer Science 2025-10-22 Jan Sobotka , Petr Šimánek , Pavel Kordík

Scalable Meta-Learning via Mixed-Mode Differentiation

Gradient-based bilevel optimisation is a powerful technique with applications in hyperparameter optimisation, task adaptation, algorithm discovery, meta-learning more broadly, and beyond. It often requires differentiating through the…

Machine Learning · Computer Science 2025-06-11 Iurii Kemaev , Dan A Calian , Luisa M Zintgraf , Gregory Farquhar , Hado van Hasselt

Scaling Forward Gradient With Local Losses

Forward gradient learning computes a noisy directional gradient and is a biologically plausible alternative to backprop for learning deep neural networks. However, the standard forward gradient algorithm, when applied naively, suffers from…

Machine Learning · Computer Science 2023-03-03 Mengye Ren , Simon Kornblith , Renjie Liao , Geoffrey Hinton

FLOPS: Forward Learning with OPtimal Sampling

Given the limitations of backpropagation, perturbation-based gradient computation methods have recently gained focus for learning with only forward passes, also referred to as queries. Conventional forward learning consumes enormous queries…

Machine Learning · Computer Science 2025-03-11 Tao Ren , Zishi Zhang , Jinyang Jiang , Guanghao Li , Zeliang Zhang , Mingqian Feng , Yijie Peng

GD-FPS: Growth-Driven Feedforward Parameter Selection for Efficient Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT) has emerged as a key strategy for adapting large-scale pre-trained models to downstream tasks, but existing approaches face notable limitations. Addition-based methods, such as Adapters, introduce…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Kenneth Yang , Wen-Li Wei , Jen-Chun Lin

Efficient Learning of Generative Models via Finite-Difference Score Matching

Several machine learning applications involve the optimization of higher-order derivatives (e.g., gradients of gradients) during training, which can be expensive in respect to memory and computation even with automatic differentiation. As a…

Machine Learning · Computer Science 2020-11-26 Tianyu Pang , Kun Xu , Chongxuan Li , Yang Song , Stefano Ermon , Jun Zhu

Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach

Federated learning (FL) is an emerging technique for training machine learning models using geographically dispersed data collected by local entities. It includes local computation and synchronization steps. To reduce the communication…

Machine Learning · Computer Science 2020-03-23 Pengchao Han , Shiqiang Wang , Kin K. Leung

Gradient-based Quadratic Multiform Separation

Classification as a supervised learning concept is an important content in machine learning. It aims at categorizing a set of data into classes. There are several commonly-used classification methods nowadays such as k-nearest neighbors,…

Machine Learning · Statistics 2021-10-27 Wen-Teng Chang

Faster Meta Update Strategy for Noise-Robust Deep Learning

It has been shown that deep neural networks are prone to overfitting on biased training data. Towards addressing this issue, meta-learning employs a meta model for correcting the training bias. Despite the promising performances, super slow…

Machine Learning · Computer Science 2021-05-03 Youjiang Xu , Linchao Zhu , Lu Jiang , Yi Yang

Fast and Slow Gradient Approximation for Binary Neural Network Optimization

Binary Neural Networks (BNNs) have garnered significant attention due to their immense potential for deployment on edge devices. However, the non-differentiability of the quantization function poses a challenge for the optimization of BNNs,…

Machine Learning · Computer Science 2024-12-17 Xinquan Chen , Junqi Gao , Biqing Qi , Dong Li , Yiang Luo , Fangyuan Li , Pengfei Li

Towards Differentiable Multilevel Optimization: A Gradient-Based Approach

Multilevel optimization has gained renewed interest in machine learning due to its promise in applications such as hyperparameter tuning and continual learning. However, existing methods struggle with the inherent difficulty of efficiently…

Machine Learning · Computer Science 2024-10-16 Yuntian Gu , Xuzheng Chen

Gradient-based Parameter Selection for Efficient Fine-Tuning

With the growing size of pre-trained models, full fine-tuning and storing all the parameters for various downstream tasks is costly and infeasible. In this paper, we propose a new parameter-efficient fine-tuning method, Gradient-based…

Computer Vision and Pattern Recognition · Computer Science 2024-12-02 Zhi Zhang , Qizhe Zhang , Zijun Gao , Renrui Zhang , Ekaterina Shutova , Shiji Zhou , Shanghang Zhang

Can Forward Gradient Match Backpropagation?

Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient…

Machine Learning · Computer Science 2023-06-13 Louis Fournier , Stéphane Rivaud , Eugene Belilovsky , Michael Eickenberg , Edouard Oyallon

Simple and Effective Gradient-Based Tuning of Sequence-to-Sequence Models

Recent trends towards training ever-larger language models have substantially improved machine learning performance across linguistic tasks. However, the huge cost of training larger models can make tuning them prohibitively expensive,…

Computation and Language · Computer Science 2022-09-13 Jared Lichtarge , Chris Alberti , Shankar Kumar

Gradient-free neural topology optimization: Towards effective fracture-resistant designs

Gradient-free optimizers allow for tackling problems regardless of the smoothness or differentiability of their objective function, but they require many more iterations to converge when compared to gradient-based algorithms. This has made…

Machine Learning · Computer Science 2024-09-24 Gawel Kus , Miguel A. Bessa

Gradient Sparsification for Efficient Wireless Federated Learning with Differential Privacy

Federated learning (FL) enables distributed clients to collaboratively train a machine learning model without sharing raw data with each other. However, it suffers the leakage of private information from uploading models. In addition, as…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-25 Kang Wei , Jun Li , Chuan Ma , Ming Ding , Feng Shu , Haitao Zhao , Wen Chen , Hongbo Zhu

Online Hyperparameter Meta-Learning with Hypergradient Distillation

Many gradient-based meta-learning methods assume a set of parameters that do not participate in inner-optimization, which can be considered as hyperparameters. Although such hyperparameters can be optimized using the existing gradient-based…

Machine Learning · Computer Science 2022-02-15 Hae Beom Lee , Hayeon Lee , Jaewoong Shin , Eunho Yang , Timothy Hospedales , Sung Ju Hwang

Learn More, Forget Less: A Gradient-Aware Data Selection Approach for LLM

Despite large language models (LLMs) have achieved impressive achievements across numerous tasks, supervised fine-tuning (SFT) remains essential for adapting these models to specialized domains. However, SFT for domain specialization can be…

Computation and Language · Computer Science 2025-11-13 Yibai Liu , Shihang Wang , Zeming Liu , Zheming Song , Junzhe Wang , Jingjing Liu , Qingjie Liu , Yunhong Wang