English
Related papers

Related papers: Adaptive Stopping Rule for Kernel-based Gradient D…

200 papers

This paper proposes a novel parameter selection strategy for kernel-based gradient descent (KGD) algorithms, integrating bias-variance analysis with the splitting method. We introduce the concept of empirical effective dimension to quantify…

Machine Learning · Statistics 2026-03-05 Xiaotong Liu , Yunwen Lei , Xiangyu Chang , Shao-Bo Lin

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or…

Machine Learning · Computer Science 2022-08-08 Serge Kas Hanna , Rawad Bitar , Parimal Parag , Venkat Dasari , Salim El Rouayheb

A central challenge in Bayesian inference is efficiently approximating posterior distributions. Stein Variational Gradient Descent (SVGD) is a popular variational inference method which transports a set of particles to approximate a target…

Machine Learning · Statistics 2025-12-05 Moritz Melcher , Simon Weissmann , Ashia C. Wilson , Jakob Zech

The strategy of early stopping is a regularization technique based on choosing a stopping time for an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy…

Machine Learning · Statistics 2013-06-18 Garvesh Raskutti , Martin J. Wainwright , Bin Yu

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu

The performance of gradient-based optimization methods, such as standard gradient descent (GD), greatly depends on the choice of learning rate. However, it can require a non-trivial amount of user tuning effort to select an appropriate…

Machine Learning · Computer Science 2025-10-14 Nikola Surjanovic , Alexandre Bouchard-Côté , Trevor Campbell

We study nonparametric regression by an over-parameterized two-layer neural network trained by gradient descent (GD) in this paper. We show that, if the neural network is trained by GD with early stopping, then the trained network renders a…

Machine Learning · Statistics 2025-11-07 Yingzhen Yang , Ping Li

Kernel-based online learning has often shown state-of-the-art performance for many online learning tasks. It, however, suffers from a major shortcoming, that is, the unbounded number of support vectors, making it non-scalable and unsuitable…

Machine Learning · Computer Science 2012-06-22 Peilin Zhao , Jialei Wang , Pengcheng Wu , Rong Jin , Steven C. H. Hoi

We introduce a novel algorithm for gradient-based optimization of stochastic objective functions. The method may be seen as a variant of SGD with momentum equipped with an adaptive learning rate automatically adjusted by an 'energy'…

Optimization and Control · Mathematics 2022-03-24 Hailiang Liu , Xuping Tian

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or…

Machine Learning · Computer Science 2023-10-18 Serge Kas Hanna , Rawad Bitar , Parimal Parag , Venkat Dasari , Salim El Rouayheb

Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability. While various methods have been proposed to speed up their convergence, the model selection…

Machine Learning · Computer Science 2014-06-17 Francesco Orabona

We investigate the construction of early stopping rules in the nonparametric regression problem where iterative learning algorithms are used and the optimal iteration number is unknown. More precisely, we study the discrepancy principle, as…

Statistics Theory · Mathematics 2020-04-21 Alain Celisse , Martin Wahl

We consider the Quantum Natural Gradient Descent (QNGD) scheme which was recently proposed to train variational quantum algorithms. QNGD is Steepest Gradient Descent (SGD) operating on the complex projective space equipped with the…

Quantum Physics · Physics 2022-11-02 Touheed Anwar Atif , Uchenna Chukwu , Jesse Berwald , Raouf Dridi

In this paper we demonstrate a simple heuristic adaptive restart technique that can dramatically improve the convergence rate of accelerated gradient schemes. The analysis of the technique relies on the observation that these schemes…

Optimization and Control · Mathematics 2012-04-19 Brendan O'Donoghue , Emmanuel Candes

Stochastic gradient descent (SGD) is a powerful optimization technique that is particularly useful in online learning scenarios. Its convergence analysis is relatively well understood under the assumption that the data samples are…

Machine Learning · Computer Science 2024-10-03 Ethan Che , Jing Dong , Xin T. Tong

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

We address the challenge of estimating the learning rate for adaptive gradient methods used in training deep neural networks. While several learning-rate-free approaches have been proposed, they are typically tailored for steepest descent.…

Machine Learning · Computer Science 2024-01-09 Min-Kook Suh , Seung-Woo Seo

We study statistical inverse learning in the context of nonlinear inverse problems under random design. Specifically, we address a class of nonlinear problems by employing gradient descent (GD) and stochastic gradient descent (SGD) with…

Machine Learning · Statistics 2024-12-24 Abhishake , Nicole Mücke , Tapio Helin

We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance…

Machine Learning · Statistics 2023-02-21 Kihyuk Hong , Yuhang Li , Ambuj Tewari

Stochastic gradient algorithms have been the main focus of large-scale learning problems and they led to important successes in machine learning. The convergence of SGD depends on the careful choice of learning rate and the amount of the…

Machine Learning · Computer Science 2015-11-03 Caglar Gulcehre , Marcin Moczulski , Yoshua Bengio
‹ Prev 1 2 3 10 Next ›