Related papers: FLOPS: Forward Learning with OPtimal Sampling

Beyond Backpropagation: Optimization with Multi-Tangent Forward Gradients

The gradients used to train neural networks are typically computed using backpropagation. While an efficient way to obtain exact gradients, backpropagation is computationally expensive, hinders parallelization, and is biologically…

Machine Learning · Computer Science 2026-01-14 Katharina Flügel , Daniel Coquelin , Marie Weiel , Charlotte Debus , Achim Streit , Markus Götz

Gradient-based Hyperparameter Optimization Over Long Horizons

Gradient-based hyperparameter optimization has earned a widespread popularity in the context of few-shot meta-learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient…

Machine Learning · Computer Science 2021-10-01 Paul Micaelli , Amos Storkey

Efficient Backpropagation with Variance-Controlled Adaptive Sampling

Sampling-based algorithms, which eliminate ''unimportant'' computations during forward and/or back propagation (BP), offer potential solutions to accelerate neural network training. However, since sampling introduces approximations to…

Machine Learning · Computer Science 2024-02-28 Ziteng Wang , Jianfei Chen , Jun Zhu

FAMO: Fast Adaptive Multitask Optimization

One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL). However, in practice, applying gradient descent (GD) on the average loss across all…

Machine Learning · Computer Science 2023-10-31 Bo Liu , Yihao Feng , Peter Stone , Qiang Liu

FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks

There exists a plethora of techniques for inducing structured sparsity in parametric models during the optimization process, with the final goal of resource-efficient inference. However, few methods target a specific number of…

Machine Learning · Computer Science 2018-11-26 Raphael Tang , Ashutosh Adhikari , Jimmy Lin

First order online optimisation using forward gradients in over-parameterised systems

The success of deep learning over the past decade mainly relies on gradient-based optimisation and backpropagation. This paper focuses on analysing the performance of first-order gradient-based optimisation algorithms, gradient descent and…

Optimization and Control · Mathematics 2022-12-08 Behnam Mafakheri , Iman Shames , Jonathan H. Manton

Can Forward Gradient Match Backpropagation?

Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient…

Machine Learning · Computer Science 2023-06-13 Louis Fournier , Stéphane Rivaud , Eugene Belilovsky , Michael Eickenberg , Edouard Oyallon

Optimization without Backpropagation

Forward gradients have been recently introduced to bypass backpropagation in autodifferentiation, while retaining unbiased estimators of true gradients. We derive an optimality condition to obtain best approximating forward gradients, which…

Machine Learning · Computer Science 2022-09-15 Gabriel Belouze

Simulating, Fast and Slow: Learning Policies for Black-Box Optimization

In recent years, solving optimization problems involving black-box simulators has become a point of focus for the machine learning community due to their ubiquity in science and engineering. The simulators describe a forward process…

Machine Learning · Computer Science 2024-06-07 Fabio Valerio Massoli , Tim Bakker , Thomas Hehn , Tribhuvanesh Orekondy , Arash Behboodi

Towards Scalable Backpropagation-Free Gradient Estimation

While backpropagation--reverse-mode automatic differentiation--has been extraordinarily successful in deep learning, it requires two passes (forward and backward) through the neural network and the storage of intermediate activations.…

Machine Learning · Computer Science 2025-11-06 Daniel Wang , Evan Markou , Dylan Campbell

Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching

Training efficiency in large-scale models is typically assessed through memory consumption, training time, and model performance. Current methods often exhibit trade-offs among these metrics, as optimizing one generally degrades at least…

Machine Learning · Computer Science 2026-02-03 Tianhao Miao , Zhongyuan Bao , Lejun Zhang

Joint Sampling and Optimisation for Inverse Rendering

When dealing with difficult inverse problems such as inverse rendering, using Monte Carlo estimated gradients to optimise parameters can slow down convergence due to variance. Averaging many gradient samples in each iteration reduces this…

Graphics · Computer Science 2023-09-28 Martin Balint , Karol Myszkowski , Hans-Peter Seidel , Gurprit Singh

Deep Learning for Optimization of Trajectories for Quadrotors

This paper presents a novel learning-based trajectory planning framework for quadrotors that combines model-based optimization techniques with deep learning. Specifically, we formulate the trajectory optimization problem as a quadratic…

Robotics · Computer Science 2023-12-05 Yuwei Wu , Xiatao Sun , Igor Spasojevic , Vijay Kumar

A Historical Trajectory Assisted Optimization Method for Zeroth-Order Federated Learning

Federated learning heavily relies on distributed gradient descent techniques. In the situation where gradient information is not available, the gradients need to be estimated from zeroth-order information, which typically involves computing…

Machine Learning · Computer Science 2024-10-25 Chenlin Wu , Xiaoyu He , Zike Li , Jing Gong , Zibin Zheng

Projection-Free Adaptive Gradients for Large-Scale Optimization

The complexity in large-scale optimization can lie in both handling the objective function and handling the constraint set. In this respect, stochastic Frank-Wolfe algorithms occupy a unique position as they alleviate both computational…

Optimization and Control · Mathematics 2021-02-16 Cyrille W. Combettes , Christoph Spiegel , Sebastian Pokutta

On Scalable and Efficient Computation of Large Scale Optimal Transport

Optimal Transport (OT) naturally arises in many machine learning applications, yet the heavy computational burden limits its wide-spread uses. To address the scalability issue, we propose an implicit generative learning-based framework…

Machine Learning · Computer Science 2019-06-26 Yujia Xie , Minshuo Chen , Haoming Jiang , Tuo Zhao , Hongyuan Zha

Gradients without Backpropagation

Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic…

Machine Learning · Computer Science 2022-02-18 Atılım Güneş Baydin , Barak A. Pearlmutter , Don Syme , Frank Wood , Philip Torr

One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning

Deep learning models in large-scale machine learning systems are often continuously trained with enormous data from production environments. The sheer volume of streaming training data poses a significant challenge to real-time training…

Machine Learning · Computer Science 2021-04-28 Chaosheng Dong , Xiaojie Jin , Weihao Gao , Yijia Wang , Hongyi Zhang , Xiang Wu , Jianchao Yang , Xiaobing Liu

Gradient Methods with Memory

In this paper, we consider gradient methods for minimizing smooth convex functions, which employ the information obtained at the previous iterations in order to accelerate the convergence towards the optimal solution. This information is…

Optimization and Control · Mathematics 2021-06-02 Yurii Nesterov , Mihai I. Florea

Reducing Variance in Meta-Learning via Laplace Approximation for Regression Tasks

Given a finite set of sample points, meta-learning algorithms aim to learn an optimal adaptation strategy for new, unseen tasks. Often, this data can be ambiguous as it might belong to different tasks concurrently. This is particularly the…

Machine Learning · Computer Science 2024-10-24 Alfredo Reichlin , Gustaf Tegnér , Miguel Vasco , Hang Yin , Mårten Björkman , Danica Kragic