Related papers: Improved Binary Forward Exploration: Learning Rate…

BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization

In this paper, a new gradient-based optimization approach by automatically adjusting the learning rate is proposed. This approach can be applied to design non-adaptive learning rate and adaptive learning rate. Firstly, I will introduce the…

Machine Learning · Computer Science 2022-07-07 Xin Cao

Binary Search and First Order Gradient Based Method for Stochastic Optimization

In this paper, we present a novel stochastic optimization method, which uses the binary search technique with first order gradient based optimization method, called Binary Search Gradient Optimization (BSG) or BiGrad. In this optimization…

Machine Learning · Computer Science 2020-07-28 Vijay Pandey

Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates

Adaptive gradient methods for stochastic optimization adjust the learning rate for each parameter locally. However, there is also a global learning rate which must be tuned in order to get the best performance. In this paper, we present a…

Machine Learning · Computer Science 2018-06-12 Hiroaki Hayashi , Jayanth Koushik , Graham Neubig

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

An Adaptive Gradient Method with Energy and Momentum

We introduce a novel algorithm for gradient-based optimization of stochastic objective functions. The method may be seen as a variant of SGD with momentum equipped with an adaptive learning rate automatically adjusted by an 'energy'…

Optimization and Control · Mathematics 2022-03-24 Hailiang Liu , Xuping Tian

Adaptive Optimization with Examplewise Gradients

We propose a new, more general approach to the design of stochastic gradient-based optimization methods for machine learning. In this new framework, optimizers assume access to a batch of gradient estimates per iteration, rather than a…

Machine Learning · Computer Science 2021-12-02 Julius Kunze , James Townsend , David Barber

Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning

Conditional stochastic optimization covers a variety of applications ranging from invariant learning and causal inference to meta-learning. However, constructing unbiased gradient estimators for such problems is challenging due to the…

Optimization and Control · Mathematics 2024-06-04 Yifan Hu , Siqi Zhang , Xin Chen , Niao He

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the…

Machine Learning · Computer Science 2015-11-03 Caglar Gulcehre , Marcin Moczulski , Yoshua Bengio

Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic

This paper proposes SplitSGD, a new dynamic learning rate schedule for stochastic optimization. This method decreases the learning rate for better adaptation to the local geometry of the objective function whenever a stationary phase is…

Machine Learning · Statistics 2024-02-20 Matteo Sordello , Niccolò Dalmasso , Hangfeng He , Weijie Su

Stochastic Learning Approach to Binary Optimization for Optimal Design of Experiments

We present a novel stochastic approach to binary optimization for optimal experimental design (OED) for Bayesian inverse problems governed by mathematical models such as partial differential equations. The OED utility function, namely, the…

Optimization and Control · Mathematics 2022-06-28 Ahmed Attia , Sven Leyffer , Todd Munson

We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond

In the rapidly advancing field of deep learning, optimising deep neural networks is paramount. This paper introduces a novel method, Enhanced Velocity Estimation (EVE), which innovatively applies different learning rates to distinct…

Machine Learning · Computer Science 2023-08-22 Afshin Khadangi

Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by…

Machine Learning · Computer Science 2018-08-23 Atilim Gunes Baydin , Robert Cornish , David Martinez Rubio , Mark Schmidt , Frank Wood

Lookahead Optimizer: k steps forward, 1 step back

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate…

Machine Learning · Computer Science 2019-12-04 Michael R. Zhang , James Lucas , Geoffrey Hinton , Jimmy Ba

Bolstering Stochastic Gradient Descent with Model Building

Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are…

Machine Learning · Computer Science 2024-03-14 S. Ilker Birbil , Ozgur Martin , Gonenc Onay , Figen Oztoprak

Adaptive Batch Size and Learning Rate Scheduler for Stochastic Gradient Descent Based on Minimization of Stochastic First-order Oracle Complexity

The convergence behavior of mini-batch stochastic gradient descent (SGD) is highly sensitive to the batch size and learning rate settings. Recent theoretical studies have identified the existence of a critical batch size that minimizes…

Machine Learning · Computer Science 2025-08-08 Hikaru Umeda , Hideaki Iiduka

Nesterov's method with decreasing learning rate leads to accelerated stochastic gradient descent

We present a coupled system of ODEs which, when discretized with a constant time step/learning rate, recovers Nesterov's accelerated gradient descent algorithm. The same ODEs, when discretized with a decreasing learning rate, leads to novel…

Optimization and Control · Mathematics 2020-09-02 Maxime Laborde , Adam M. Oberman

Stochastic Learning Rate Optimization in the Stochastic Approximation and Online Learning Settings

In this work, multiplicative stochasticity is applied to the learning rate of stochastic optimization algorithms, giving rise to stochastic learning-rate schemes. In-expectation theoretical convergence results of Stochastic Gradient Descent…

Optimization and Control · Mathematics 2022-03-22 Theodoros Mamalis , Dusan Stipanovic , Petros Voulgaris

Learning Rate Adaptation for Federated and Differentially Private Learning

We propose an algorithm for the adaptation of the learning rate for stochastic gradient descent (SGD) that avoids the need for validation set use. The idea for the adaptiveness comes from the technique of extrapolation: to get an estimate…

Machine Learning · Statistics 2020-08-28 Antti Koskela , Antti Honkela

A Dynamic Sampling Adaptive-SGD Method for Machine Learning

We propose a stochastic optimization method for minimizing loss functions, expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such…

Machine Learning · Computer Science 2020-03-04 Achraf Bahamou , Donald Goldfarb

eagle: early approximated gradient based learning rate estimator

We propose EAGLE update rule, a novel optimization method that accelerates loss convergence during the early stages of training by leveraging both current and previous step parameter and gradient values. The update algorithm estimates…

Machine Learning · Computer Science 2025-02-04 Takumi Fujimoto , Hiroaki Nishi