Related papers: Last-iterate convergence rates for min-max optimiz…

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

We study the asynchronous stochastic gradient descent algorithm for distributed training over $n$ workers which have varying computation and communication frequency over time. In this algorithm, workers compute stochastic gradients in…

Machine Learning · Computer Science 2022-06-17 Anastasia Koloskova , Sebastian U. Stich , Martin Jaggi

Convergence of Unregularized Online Learning Algorithms

In this paper we study the convergence of online gradient descent algorithms in reproducing kernel Hilbert spaces (RKHSs) without regularization. We establish a sufficient condition and a necessary condition for the convergence of excess…

Machine Learning · Computer Science 2017-08-11 Yunwen Lei , Lei Shi , Zheng-Chu Guo

Optimal convergence rates of totally asynchronous optimization

Asynchronous optimization algorithms are at the core of modern machine learning and resource allocation systems. However, most convergence results consider bounded information delays and several important algorithms lack guarantees when…

Optimization and Control · Mathematics 2022-03-10 Xuyang Wu , Sindri Magnusson , Hamid Reza Feyzmahdavian , Mikael Johansson

Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees

We introduce novel convergence results for asynchronous iterations that appear in the analysis of parallel and distributed optimization algorithms. The results are simple to apply and give explicit estimates for how the degree of asynchrony…

Optimization and Control · Mathematics 2023-04-04 Hamid Reza Feyzmahdavian , Mikael Johansson

Accelerated Almost-Sure Convergence Rates for Nonconvex Stochastic Gradient Descent using Stochastic Learning Rates

Large-scale optimization problems require algorithms both effective and efficient. One such popular and proven algorithm is Stochastic Gradient Descent which uses first-order gradient information to solve these problems. This paper studies…

Optimization and Control · Mathematics 2021-11-11 Theodoros Mamalis , Dusan Stipanovic , Petros Voulgaris

Convergence Analysis of Homotopy-SGD for non-convex optimization

First-order stochastic methods for solving large-scale non-convex optimization problems are widely used in many big-data applications, e.g. training deep neural networks as well as other complex and potentially non-convex machine learning…

Machine Learning · Computer Science 2020-11-23 Matilde Gargiani , Andrea Zanelli , Quoc Tran-Dinh , Moritz Diehl , Frank Hutter

Convergence Rate of Stochastic Gradient Search in the Case of Multiple and Non-Isolated Minima

The convergence rate of stochastic gradient search is analyzed in this paper. Using arguments based on differential geometry and Lojasiewicz inequalities, tight bounds on the convergence rate of general stochastic gradient algorithms are…

Optimization and Control · Mathematics 2009-04-28 Vladislav B. Tadić

On the convergence of mirror descent beyond stochastic convex programming

In this paper, we examine the convergence of mirror descent in a class of stochastic optimization problems that are not necessarily convex (or even quasi-convex), and which we call variationally coherent. Since the standard technique of…

Optimization and Control · Mathematics 2018-07-17 Zhengyuan Zhou , Panayotis Mertikopoulos , Nicholas Bambos , Stephen Boyd , Peter Glynn

On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging

This paper considers stochastic subgradient mirror-descent method for solving constrained convex minimization problems. In particular, a stochastic subgradient mirror-descent method with weighted iterate-averaging is investigated and its…

Optimization and Control · Mathematics 2013-07-09 Angelia Nedich , Soomin Lee

Optimality of the final model found via Stochastic Gradient Descent

We study convergence properties of Stochastic Gradient Descent (SGD) for convex objectives without assumptions on smoothness or strict convexity. We consider the question of establishing that with high probability the objective evaluated at…

Machine Learning · Computer Science 2018-10-23 Andrea Schioppa

On the Finite Time Convergence of Cyclic Coordinate Descent Methods

Cyclic coordinate descent is a classic optimization method that has witnessed a resurgence of interest in machine learning. Reasons for this include its simplicity, speed and stability, as well as its competitive performance on $\ell_1$…

Machine Learning · Computer Science 2015-03-17 Ankan Saha , Ambuj Tewari

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

Owing to their connection with generative adversarial networks (GANs), saddle-point problems have recently attracted considerable interest in machine learning and beyond. By necessity, most theoretical guarantees revolve around…

Machine Learning · Computer Science 2018-10-03 Panayotis Mertikopoulos , Bruno Lecouat , Houssam Zenati , Chuan-Sheng Foo , Vijay Chandrasekhar , Georgios Piliouras

Aug-PDG: Linear Convergence of Convex Optimization with Inequality Constraints

This paper investigates the convex optimization problem with general convex inequality constraints. To cope with this problem, a discrete-time algorithm, called augmented primal-dual gradient algorithm (Aug-PDG), is studied and analyzed. It…

Optimization and Control · Mathematics 2020-11-18 Min Meng , Xiuxian Li

Optimized convergence of stochastic gradient descent by weighted averaging

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the…

Optimization and Control · Mathematics 2022-10-06 Melinda Hagedorn , Florian Jarre

High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively…

Machine Learning · Computer Science 2026-05-04 Aleksandar Armacki , Haoyuan Cai , Ali H. Sayed

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

Matrix Completion via Nonconvex Regularization: Convergence of the Proximal Gradient Algorithm

Matrix completion has attracted much interest in the past decade in machine learning and computer vision. For low-rank promotion in matrix completion, the nuclear norm penalty is convenient due to its convexity but has a bias problem.…

Machine Learning · Computer Science 2019-03-05 Fei Wen , Rendong Ying , Peilin Liu , Trieu-Kien Truong

Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection

Attention mechanisms have revolutionized several domains of artificial intelligence, such as natural language processing and computer vision, by enabling models to selectively focus on relevant parts of the input data. While recent work has…

Machine Learning · Computer Science 2026-02-03 Addison Kristanto Julistiono , Davoud Ataee Tarzanagh , Navid Azizan

A Globally and Quadratically Convergent Algorithm with Efficient Implementation for Unconstrained Optimization

In this paper, an efficient modified Newton type algorithm is proposed for nonlinear unconstrianed optimization problems. The modified Hessian is a convex combination of the identity matrix (for steepest descent algorithm) and the Hessian…

Optimization and Control · Mathematics 2015-10-09 Yaguang Yang

Projected Subgradient Ascent for Convex Maximization

We consider the problem of maximizing a convex function over a closed convex set in a real Hilbert space. For linear functions, we show that a single orthogonal projection suffices to obtain an approximate solution. For continuous convex…

Optimization and Control · Mathematics 2026-02-23 Pedro Felzenszwalb , Heon Lee