Related papers: Predictive Local Smoothness for Stochastic Gradien…

Local Convergence Properties of SAGA/Prox-SVRG and Acceleration

Over the past ten years, driven by large scale optimisation problems arising from machine learning, the development of stochastic optimisation methods have witnessed a tremendous growth. However, despite their popularity, the theoretical…

Optimization and Control · Mathematics 2018-11-05 Clarice Poon , Jingwei Liang , Carola-Bibiane Schönlieb

Tackling benign nonconvexity with smoothing and stochastic gradients

Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning. While such complex problems can often be successfully optimized in practice by using stochastic gradient descent (SGD), theoretical analysis…

Machine Learning · Computer Science 2022-02-21 Harsh Vardhan , Sebastian U. Stich

Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization

Our work focuses on stochastic gradient methods for optimizing a smooth non-convex loss function with a non-smooth non-convex regularizer. Research on this class of problem is quite limited, and until recently no non-asymptotic convergence…

Optimization and Control · Mathematics 2019-05-15 Michael R. Metel , Akiko Takeda

A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed…

Optimization and Control · Mathematics 2013-03-12 Nicolas Le Roux , Mark Schmidt , Francis Bach

Stochastic Proximal Methods for Non-Smooth Non-Convex Constrained Sparse Optimization

This paper focuses on stochastic proximal gradient methods for optimizing a smooth non-convex loss function with a non-smooth non-convex regularizer and convex constraints. To the best of our knowledge we present the first non-asymptotic…

Optimization and Control · Mathematics 2019-05-27 Michael R. Metel , Akiko Takeda

Optimal Asymptotic Rates for (Stochastic) Gradient Descent under the Local PL-Condition: A Geometric Approach

Stochastic gradient descent (SGD) has been studied extensively over the past decades due to its simplicity and broad applicability in machine learning. In this work, we analyze the local behavior of gradient descent and stochastic gradient…

Optimization and Control · Mathematics 2026-05-15 Sebastian Kassing , Thomas Kruse

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

Lp and almost sure rates of convergence of averaged stochastic gradient algorithms: locally strongly convex objective

An usual problem in statistics consists in estimating the minimizer of a convex function. When we have to deal with large samples taking values in high dimensional spaces, stochastic gradient algorithms and their averaged versions are…

Statistics Theory · Mathematics 2022-01-12 Antoine Godichon-Baggioni

Adaptive Learning Rates for Faster Stochastic Gradient Methods

In this work, we propose new adaptive step size strategies that improve several stochastic gradient methods. Our first method (StoPS) is based on the classical Polyak step size (Polyak, 1987) and is an extension of the recent development of…

Machine Learning · Computer Science 2022-08-11 Samuel Horváth , Konstantin Mishchenko , Peter Richtárik

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for…

Machine Learning · Computer Science 2015-03-19 Alexander Rakhlin , Ohad Shamir , Karthik Sridharan

No More Pesky Learning Rates

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any…

Machine Learning · Statistics 2013-02-19 Tom Schaul , Sixin Zhang , Yann LeCun

A Unified Theory of Stochastic Proximal Point Methods without Smoothness

This paper presents a comprehensive analysis of a broad range of variations of the stochastic proximal point method (SPPM). Proximal point methods have attracted considerable interest owing to their numerical stability and robustness…

Optimization and Control · Mathematics 2024-05-28 Peter Richtárik , Abdurakhmon Sadiev , Yury Demidovich

Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization

We propose and analyze several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems. First, we propose a simple proximal…

Machine Learning · Computer Science 2022-08-23 Zhize Li , Jian Li

Randomized Smoothing for Stochastic Optimization

We analyze convergence rates of stochastic optimization procedures for non-smooth convex optimization problems. By combining randomized smoothing techniques with accelerated gradient methods, we obtain convergence rates of stochastic…

Optimization and Control · Mathematics 2012-04-10 John C. Duchi , Peter L. Bartlett , Martin J. Wainwright

Revisiting Stochastic Proximal Point Methods: Generalized Smoothness and Similarity

The growing prevalence of nonsmooth optimization problems in machine learning has spurred significant interest in generalized smoothness assumptions. Among these, the (L0, L1)-smoothness assumption has emerged as one of the most prominent.…

Optimization and Control · Mathematics 2026-02-24 Zhirayr Tovmasyan , Grigory Malinovsky , Laurent Condat , Peter Richtárik

Extended convexity and smoothness and their applications in deep learning

Classical assumptions like strong convexity and Lipschitz smoothness often fail to capture the nature of deep learning optimization problems, which are typically non-convex and non-smooth, making traditional analyses less applicable. This…

Machine Learning · Computer Science 2025-05-01 Binchuan Qi , Wei Gong , Li Li

SPRINT: Stochastic Performative Prediction With Variance Reduction

Performative prediction (PP) is an algorithmic framework for optimizing machine learning (ML) models where the model's deployment affects the distribution of the data it is trained on. Compared to traditional ML with fixed data, designing…

Machine Learning · Computer Science 2025-09-24 Tian Xie , Ding Zhu , Jia Liu , Mahdi Khalili , Xueru Zhang

Faster Sampling via Stochastic Gradient Proximal Sampler

Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than…

Machine Learning · Statistics 2024-05-28 Xunpeng Huang , Difan Zou , Yi-An Ma , Hanze Dong , Tong Zhang

Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming

In this paper, we introduce a new stochastic approximation (SA) type algorithm, namely the randomized stochastic gradient (RSG) method, for solving an important class of nonlinear (possibly nonconvex) stochastic programming (SP) problems.…

Optimization and Control · Mathematics 2015-10-27 Saeed Ghadimi , Guanghui Lan

Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems

In this paper, we introduce a stochastic projected subgradient method for weakly convex (i.e., uniformly prox-regular) nonsmooth, nonconvex functions---a wide class of functions which includes the additive and convex composite classes. At a…

Optimization and Control · Mathematics 2018-09-19 Damek Davis , Benjamin Grimmer