Related papers: Last-iterate convergence rates for min-max optimiz…

Stochastic Gradient Descent for Stochastic Doubly-Nonconvex Composite Optimization

The stochastic gradient descent has been widely used for solving composite optimization problems in big data analyses. Many algorithms and convergence properties have been developed. The composite functions were convex primarily and…

Machine Learning · Statistics 2020-03-03 Takayuki Kawashima , Hironori Fujisawa

Non-Asymptotic Convergence Analysis of Inexact Gradient Methods for Machine Learning Without Strong Convexity

Many recent applications in machine learning and data fitting call for the algorithmic solution of structured smooth convex optimization problems. Although the gradient descent method is a natural choice for this task, it requires exact…

Optimization and Control · Mathematics 2013-09-03 Anthony Man-Cho So

Recent Theoretical Advances in Non-Convex Optimization

Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical…

Optimization and Control · Mathematics 2021-11-29 Marina Danilova , Pavel Dvurechensky , Alexander Gasnikov , Eduard Gorbunov , Sergey Guminov , Dmitry Kamzolov , Innokentiy Shibaev

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation

Equilibrium computation on Riemannian manifolds provides a unifying framework for numerous problems in machine learning and data analytics. One of the simplest yet most fundamental methods is Riemannian gradient descent (RGD). While its…

Optimization and Control · Mathematics 2025-11-11 Yang Cai , Michael I. Jordan , Tianyi Lin , Argyris Oikonomou , Emmanouil-Vasileios Vlatakis-Gkaragkounis

First-order Convergence Theory for Weakly-Convex-Weakly-Concave Min-max Problems

In this paper, we consider first-order convergence theory and algorithms for solving a class of non-convex non-concave min-max saddle-point problems, whose objective function is weakly convex in the variables of minimization and weakly…

Optimization and Control · Mathematics 2021-07-08 Mingrui Liu , Hassan Rafique , Qihang Lin , Tianbao Yang

Maximin Optimization for Binary Regression

We consider regression problems with binary weights. Such optimization problems are ubiquitous in quantized learning models and digital communication systems. A natural approach is to optimize the corresponding Lagrangian using variants of…

Machine Learning · Computer Science 2020-12-01 Nisan Chiprut , Amir Globerson , Ami Wiesel

Improved Analysis of Clipping Algorithms for Non-convex Optimization

Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, \citet{zhang2019gradient} show that clipped (stochastic) Gradient Descent (GD)…

Machine Learning · Computer Science 2020-10-30 Bohang Zhang , Jikai Jin , Cong Fang , Liwei Wang

On Linear Convergence of Distributed Stochastic Bilevel Optimization over Undirected Networks via Gradient Aggregation

Many large-scale constrained optimization problems can be formulated as bilevel distributed optimization tasks over undirected networks, where agents collaborate to minimize a global cost function while adhering to constraints, relying only…

Optimization and Control · Mathematics 2025-11-25 Ajay Tak , Mayank Baranwal

Convergence of Stochastic Proximal Gradient Algorithm

We prove novel convergence results for a stochastic proximal gradient algorithm suitable for solving a large class of convex optimization problems, where a convex objective function is given by the sum of a smooth and a possibly non-smooth…

Optimization and Control · Mathematics 2016-08-11 Lorenzo Rosasco , Silvia Villa , Bang Công Vũ

Understanding Accelerated Gradient Methods: Lyapunov Analyses and Hamiltonian Assisted Interpretations

We formulate two classes of first-order algorithms more general than previously studied for minimizing smooth and strongly convex or, respectively, smooth and convex functions. We establish sufficient conditions, via new discrete Lyapunov…

Optimization and Control · Mathematics 2023-04-21 Penghui Fu , Zhiqiang Tan

Convergence Rate for the Last Iterate of Stochastic Gradient Descent Schemes

We study the convergence rate for the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) in the parametric setting when the objective function $F$ is globally convex or non-convex whose gradient is…

Optimization and Control · Mathematics 2026-03-11 Marcel Hudiani

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

In this paper, we study the implicit regularization of the gradient descent algorithm in homogeneous neural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations. In particular, we study…

Machine Learning · Computer Science 2021-01-01 Kaifeng Lyu , Jian Li

On the Convergence of Step Decay Step-Size for Stochastic Optimization

The convergence of stochastic gradient descent is highly dependent on the step-size, especially on non-convex problems such as neural network training. Step decay step-size schedules (constant and then cut) are widely used in practice…

Optimization and Control · Mathematics 2021-02-19 Xiaoyu Wang , Sindri Magnússon , Mikael Johansson

Sharp global convergence guarantees for iterative nonconvex optimization: A Gaussian process perspective

We consider a general class of regression models with normally distributed covariates, and the associated nonconvex problem of fitting these models from data. We develop a general recipe for analyzing the convergence of iterative algorithms…

Optimization and Control · Mathematics 2021-09-22 Kabir Aladin Chandrasekher , Ashwin Pananjady , Christos Thrampoulidis

Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks

The optimization algorithms are crucial in training physics-informed neural networks (PINNs), as unsuitable methods may lead to poor solutions. Compared to the common gradient descent (GD) algorithm, implicit gradient descent (IGD)…

Machine Learning · Computer Science 2025-08-04 Xianliang Xu , Ting Du , Wang Kong , Bin Shan , Ye Li , Zhongyi Huang

Quadratic Gradient: A Unified Framework Bridging Gradient Descent and Newton-Type Methods by Synthesizing Hessians and Gradients

Accelerating the convergence of second-order optimization, particularly Newton-type methods, remains a pivotal challenge in algorithmic research. In this paper, we extend previous work on the \textbf{Quadratic Gradient (QG)} and rigorously…

Optimization and Control · Mathematics 2026-04-01 John Chiang

New Convergence Aspects of Stochastic Gradient Algorithms

The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is violated for cases where the objective…

Optimization and Control · Mathematics 2019-11-12 Lam M. Nguyen , Phuong Ha Nguyen , Peter Richtárik , Katya Scheinberg , Martin Takáč , Marten van Dijk

Convergence Analysis of Noisy Distributed Gradient Descent for Non-convex Optimization -- Saddle Point Escape

A variant of consensus based distributed gradient descent (\textbf{DGD}) is studied for finite sums of smooth but possibly non-convex functions. In particular, the local gradient term in the fixed step-size iteration of each agent is…

Optimization and Control · Mathematics 2026-05-27 Lei Qin , Michael Cantoni , Ye Pu

Hybrid Conditional Gradient - Smoothing Algorithms with Applications to Sparse and Low Rank Regularization

We study a hybrid conditional gradient - smoothing algorithm (HCGS) for solving composite convex optimization problems which contain several terms over a bounded set. Examples of these include regularization problems with several norms as…

Optimization and Control · Mathematics 2014-04-16 Andreas Argyriou , Marco Signoretto , Johan Suykens