Related papers: Stochastic gradient algorithms from ODE splitting …

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

We obtain an improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives, improving from a quadratic dependence on the conditioning $(L/\mu)^2$ (where $L$ is a bound on…

Numerical Analysis · Mathematics 2015-01-19 Deanna Needell , Nathan Srebro , Rachel Ward

Randomised Splitting Methods and Stochastic Gradient Descent

We explore an explicit link between stochastic gradient descent using common batching strategies and splitting methods for ordinary differential equations. From this perspective, we introduce a new minibatching strategy (called Symmetric…

Optimization and Control · Mathematics 2025-04-08 Luke Shaw , Peter A. Whalley

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

On Stochastic Gradient and Subgradient Methods with Adaptive Steplength Sequences

The performance of standard stochastic approximation implementations can vary significantly based on the choice of the steplength sequence, and in general, little guidance is provided about good choices. Motivated by this gap, in the first…

Optimization and Control · Mathematics 2015-03-19 Farzad Yousefian , Angelia Nedić , Uday V. Shanbhag

Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates

The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications. The scientific literature provides a vast amount of upper error bounds for the SGD method. Much less attention…

Numerical Analysis · Mathematics 2020-10-05 Arnulf Jentzen , Philippe von Wurstemberger

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a…

Machine Learning · Statistics 2018-12-27 Lam M. Nguyen , Nam H. Nguyen , Dzung T. Phan , Jayant R. Kalagnanam , Katya Scheinberg

Optimized convergence of stochastic gradient descent by weighted averaging

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the…

Optimization and Control · Mathematics 2022-10-06 Melinda Hagedorn , Florian Jarre

Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems

Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation. In this paper, we exhibit a step size scheme for SGD on a…

Machine Learning · Computer Science 2015-02-11 Christopher De Sa , Kunle Olukotun , Christopher Ré

Adaptive Sequential Machine Learning

A framework previously introduced in [3] for solving a sequence of stochastic optimization problems with bounded changes in the minimizers is extended and applied to machine learning problems such as regression and classification. The…

Machine Learning · Computer Science 2019-04-08 Craig Wilson , Yuheng Bu , Venugopal Veeravalli

Stochastic gradient with least-squares control variates

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by…

Optimization and Control · Mathematics 2025-11-21 Fabio Nobile , Matteo Raviola , Nathan Schaeffer

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

We propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning. The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an…

Machine Learning · Statistics 2020-02-04 Kenji Kawaguchi , Haihao Lu

Adaptive Sequential Optimization with Applications to Machine Learning

A framework is introduced for solving a sequence of slowly changing optimization problems, including those arising in regression and classification applications, using optimization algorithms such as stochastic gradient descent (SGD). The…

Machine Learning · Computer Science 2015-09-25 Craig Wilson , Venugopal V. Veeravalli

On the Convergence and Complexity of the Stochastic Central Finite-Difference Based Gradient Estimation Methods

This paper presents an algorithmic framework for solving unconstrained stochastic optimization problems using only stochastic function evaluations. We employ central finite-difference based gradient estimation methods to approximate the…

Optimization and Control · Mathematics 2025-01-14 Raghu Bollapragada , Cem Karamanli

Stochastic Gradient Descent in the Viewpoint of Graduated Optimization

Stochastic gradient descent (SGD) method is popular for solving non-convex optimization problems in machine learning. This work investigates SGD from a viewpoint of graduated optimization, which is a widely applied approach for non-convex…

Optimization and Control · Mathematics 2023-08-15 Da Li , Jingjing Wu , Qingrun Zhang

An introduction to decentralized stochastic optimization with gradient tracking

Decentralized solutions to finite-sum minimization are of significant importance in many signal processing, control, and machine learning applications. In such settings, the data is distributed over a network of arbitrarily-connected nodes…

Machine Learning · Computer Science 2019-11-14 Ran Xin , Soummya Kar , Usman A. Khan

Optimization via First-Order Switching Methods: Skew-Symmetric Dynamics and Optimistic Discretization

Large-scale constrained optimization problems are at the core of many tasks in control, signal processing, and machine learning. Notably, problems with functional constraints arise when, beyond a performance{\nobreakdash-}centric goal…

Optimization and Control · Mathematics 2025-05-15 Antesh Upadhyay , Sang Bin Moon , Abolfazl Hashemi

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Gradient Methods with Online Scaling Part I. Theoretical Foundations

This paper establishes the theoretical foundations of the online scaled gradient methods (OSGM), a framework that utilizes online learning to adapt stepsizes and provably accelerate first-order methods. OSGM quantifies the effectiveness of…

Optimization and Control · Mathematics 2025-09-08 Wenzhi Gao , Ya-Chi Chu , Yinyu Ye , Madeleine Udell

S-D-RSM: Stochastic Distributed Regularized Splitting Method for Large-Scale Convex Optimization Problems

This paper investigates the problems large-scale distributed composite convex optimization, with motivations from a broad range of applications, including multi-agent systems, federated learning, smart grids, wireless sensor networks,…

Optimization and Control · Mathematics 2025-12-16 Maoran Wang , Xingju Cai , Yongxin Chen

On Graduated Optimization for Stochastic Non-Convex Problems

The graduated optimization approach, also known as the continuation method, is a popular heuristic to solving non-convex problems that has received renewed interest over the last decade. Despite its popularity, very little is known in terms…

Machine Learning · Computer Science 2015-07-28 Elad Hazan , Kfir Y. Levy , Shai Shalev-Shwartz