English
Related papers

Related papers: CompAdaGrad: A Compressed, Complementary, Computat…

200 papers

Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by…

A novel approach is given to overcome the computational challenges of the full-matrix Adaptive Gradient algorithm (Full AdaGrad) in stochastic optimization. By developing a recursive method that estimates the inverse of the square root of…

Statistics Theory · Mathematics 2025-02-28 Antoine Godichon-Baggioni , Wei Lu , Bruno Portier

Adaptive gradient methods like Adagrad and its variants are widespread in large-scale optimization. However, their use of diagonal preconditioning matrices limits the ability to capture parameter correlations. Full-matrix adaptive methods,…

Machine Learning · Computer Science 2025-09-01 Tatyana Matveeva , Aleksandr Katrutsa , Evgeny Frolov

We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods. MADGRAD shows excellent performance on deep learning optimization problems from multiple fields, including classification and…

Machine Learning · Computer Science 2021-08-27 Aaron Defazio , Samy Jelassi

Adaptive gradient methods such as AdaGrad and its variants update the stepsize in stochastic gradient descent on the fly according to the gradients received along the way; such methods have gained widespread use in large-scale optimization…

Machine Learning · Statistics 2021-04-20 Rachel Ward , Xiaoxia Wu , Leon Bottou

We provide a new adaptive method for online convex optimization, MetaGrad, that is robust to general convex losses but achieves faster rates for a broad class of special functions, including exp-concave and strongly convex functions, but…

Machine Learning · Computer Science 2021-08-31 Tim van Erven , Wouter M. Koolen , Dirk van der Hoeven

We provide new adaptive first-order methods for constrained convex optimization. Our main algorithms AdaACSA and AdaAGD+ are accelerated methods, which are universal in the sense that they achieve nearly-optimal convergence rates for both…

Machine Learning · Computer Science 2021-02-17 Alina Ene , Huy L. Nguyen , Adrian Vladu

Vanilla gradient methods are often highly sensitive to the choice of stepsize, which typically requires manual tuning. Adaptive methods alleviate this issue and have therefore become widely used. Among them, AdaGrad has been particularly…

Machine Learning · Statistics 2026-02-16 Matia Bojovic , Saverio Salzo , Massimiliano Pontil

In online convex optimization it is well known that certain subclasses of objective functions are much easier than arbitrary convex functions. We are interested in designing adaptive methods that can automatically get fast rates in as many…

Machine Learning · Computer Science 2021-08-31 Tim van Erven , Wouter M. Koolen

The classical AdaGrad method adapts the learning rate by dividing by the square root of a sum of squared gradients. Because this sum on the denominator is increasing, the method can only decrease step sizes over time, and requires a…

Machine Learning · Computer Science 2022-06-15 Aaron Defazio , Baoyu Zhou , Lin Xiao

Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient…

Machine Learning · Computer Science 2017-12-08 Chia-Yu Chen , Jungwook Choi , Daniel Brand , Ankur Agrawal , Wei Zhang , Kailash Gopalakrishnan

Adaptive gradient methods like AdaGrad are widely used in optimizing neural networks. Yet, existing convergence guarantees for adaptive gradient methods require either convexity or smoothness, and, in the smooth setting, only guarantee…

Machine Learning · Computer Science 2019-10-22 Xiaoxia Wu , Simon S. Du , Rachel Ward

Adaptive gradient optimizers (AdaGrad), which dynamically adjust the learning rate based on iterative gradients, have emerged as powerful tools in deep learning. These adaptive methods have significantly succeeded in various deep learning…

Optimization and Control · Mathematics 2024-12-31 Ruinan Jin , Xiaoyu Wang , Baoxiang Wang

We present adaptive gradient methods (both basic and accelerated) for solving convex composite optimization problems in which the main part is approximately smooth (a.k.a. $(\delta, L)$-smooth) and can be accessed only via a (potentially…

Optimization and Control · Mathematics 2024-06-11 Anton Rodomanov , Xiaowen Jiang , Sebastian Stich

Adaptive gradient algorithms such as ADAGRAD and its variants have gained popularity in the training of deep neural networks. While many works as for adaptive methods have focused on the static regret as a performance metric to achieve a…

Machine Learning · Computer Science 2022-09-07 Parvin Nazari , Esmaile Khorram

Adaptive gradient methods have been widely adopted in training large-scale deep neural networks, especially large foundation models. Despite the huge success in practice, their theoretical advantages over classical gradient methods with…

Machine Learning · Computer Science 2024-10-15 Yuxing Liu , Rui Pan , Tong Zhang

Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic…

As application demands for online convex optimization accelerate, the need for designing new methods that simultaneously cover a large class of convex functions and impose the lowest possible regret is highly rising. Known online…

Machine Learning · Computer Science 2019-06-04 Saeed Masoudian , Ali Arabzadeh , Mahdi Jafari Siavoshani , Milad Jalal , Alireza Amouzad

Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We…

Machine Learning · Statistics 2018-05-23 Ashia C. Wilson , Rebecca Roelofs , Mitchell Stern , Nathan Srebro , Benjamin Recht

Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an…

Machine Learning · Computer Science 2023-10-05 Zijian Liu , Ta Duy Nguyen , Alina Ene , Huy L. Nguyen
‹ Prev 1 2 3 10 Next ›