Related papers: Adaptive Stopping Rule for Kernel-based Gradient D…

Beyond Cross-Validation: Adaptive Parameter Selection for Kernel-Based Gradient Descents

This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify…

Machine Learning · Statistics 2026-03-05 Xiaotong Liu , Yunwen Lei , Xiangyu Chang , Shao-Bo Lin

Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or…

Machine Learning · Computer Science 2022-08-08 Serge Kas Hanna , Rawad Bitar , Parimal Parag , Venkat Dasari , Salim El Rouayheb

Adaptive Kernel Selection for Stein Variational Gradient Descent

A central challenge in Bayesian inference is efficiently approximating posterior distributions. Stein Variational Gradient Descent (SVGD) is a popular variational inference method which transports a set of particles to approximate a target…

Machine Learning · Statistics 2025-12-05 Moritz Melcher , Simon Weissmann , Ashia C. Wilson , Jakob Zech

Early stopping and non-parametric regression: An optimal data-dependent stopping rule

The strategy of early stopping is a regularization technique based on choosing a stopping time for an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy…

Machine Learning · Statistics 2013-06-18 Garvesh Raskutti , Martin J. Wainwright , Bin Yu

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu

AutoGD: Automatic Learning Rate Selection for Gradient Descent

The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…

Machine Learning · Computer Science 2025-10-14 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell

Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression

We study nonparametric regression by an over-parameterized two-layer neural network trained by gradient descent (GD) in this paper. We show that, if the neural network is trained by GD with early stopping, then the trained network renders a…

Machine Learning · Statistics 2025-11-07 Yingzhen Yang , Ping Li

Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning

Kernel-based online learning has often shown state-of-the-art performance for many online learning tasks. It, however, suffers from a major shortcoming, that is, the unbounded number of support vectors, making it non-scalable and unsuitable…

Machine Learning · Computer Science 2012-06-22 Peilin Zhao , Jialei Wang , Pengcheng Wu , Rong Jin , Steven C. H. Hoi

An Adaptive Gradient Method with Energy and Momentum

We introduce a novel algorithm for gradient-based optimization of stochastic objective functions. The method may be seen as a variant of SGD with momentum equipped with an adaptive learning rate automatically adjusted by an 'energy'…

Optimization and Control · Mathematics 2022-03-24 Hailiang Liu , Xuping Tian

Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or…

Machine Learning · Computer Science 2023-10-18 Serge Kas Hanna , Rawad Bitar , Parimal Parag , Venkat Dasari , Salim El Rouayheb

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability. While various methods have been proposed to speed up their convergence, the model selection…

Machine Learning · Computer Science 2014-06-17 Francesco Orabona

Analyzing the discrepancy principle for kernelized spectral filter learning algorithms

We investigate the construction of early stopping rules in the nonparametric regression problem where iterative learning algorithms are used and the optimal iteration number is unknown. More precisely, we study the discrepancy principle, as…

Statistics Theory · Mathematics 2020-04-21 Alain Celisse , Martin Wahl

Quantum Natural Gradient with Efficient Backtracking Line Search

We consider the Quantum Natural Gradient Descent (QNGD) scheme which was recently proposed to train variational quantum algorithms. QNGD is Steepest Gradient Descent (SGD) operating on the complex projective space equipped with the…

Quantum Physics · Physics 2022-11-02 Touheed Anwar Atif , Uchenna Chukwu , Jesse Berwald , Raouf Dridi

Adaptive Restart for Accelerated Gradient Schemes

In this paper we demonstrate a simple heuristic adaptive restart technique that can dramatically improve the convergence rate of accelerated gradient schemes. The analysis of the technique relies on the observation that these schemes…

Optimization and Control · Mathematics 2012-04-19 Brendan O'Donoghue , Emmanuel Candes

Stochastic Gradient Descent with Adaptive Data

Stochastic gradient descent (SGD) is a powerful optimization technique that is particularly useful in online learning scenarios. Its convergence analysis is relatively well understood under the assumption that the data samples are…

Machine Learning · Computer Science 2024-10-03 Ethan Che , Jing Dong , Xin T. Tong

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

Interpreting Adaptive Gradient Methods by Parameter Scaling for Learning-Rate-Free Optimization

We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks. While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent.…

Machine Learning · Computer Science 2024-01-09 Min-Kook Suh , Seung-Woo Seo

Gradient-Based Non-Linear Inverse Learning

We study statistical inverse learning in the context of nonlinear inverse problems under random design. Specifically, we address a class of nonlinear problems by employing gradient descent (GD) and stochastic gradient descent (SGD) with…

Machine Learning · Statistics 2024-12-24 Abhishake , Nicole Mücke , Tapio Helin

An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance…

Machine Learning · Statistics 2023-02-21 Kihyuk Hong , Yuhang Li , Ambuj Tewari

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the…

Machine Learning · Computer Science 2015-11-03 Caglar Gulcehre , Marcin Moczulski , Yoshua Bengio