Related papers: Perturbed Iterate Analysis for Asynchronous Stocha…

Improved asynchronous parallel optimization analysis for stochastic incremental methods

As datasets continue to increase in size and multi-core computer architectures are developed, asynchronous parallel optimization algorithms become more and more essential to the field of Machine Learning. Unfortunately, conducting the…

Optimization and Control · Mathematics 2019-03-26 Rémi Leblond , Fabian Pedregosa , Simon Lacoste-Julien

Accelerating Perturbed Stochastic Iterates in Asynchronous Lock-Free Optimization

We show that stochastic acceleration can be achieved under the perturbed iterate framework (Mania et al., 2017) in asynchronous lock-free optimization, which leads to the optimal incremental gradient complexity for finite-sum objectives. We…

Optimization and Control · Mathematics 2021-10-01 Kaiwen Zhou , Anthony Man-Cho So , James Cheng

Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms

Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems. Researchers and industry have developed several techniques to optimize SGD's runtime performance, including asynchronous execution and…

Machine Learning · Computer Science 2015-10-06 Christopher De Sa , Ce Zhang , Kunle Olukotun , Christopher Ré

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have…

Machine Learning · Computer Science 2016-01-26 Sashank J. Reddi , Ahmed Hefny , Suvrit Sra , Barnabás Póczos , Alex Smola

Fast Asynchronous Parallel Stochastic Gradient Decent

Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD…

Machine Learning · Statistics 2015-08-25 Shen-Yi Zhao , Wu-Jun Li

Asynchronous Heavy-Tailed Optimization

Heavy-tailed stochastic gradient noise, commonly observed in transformer models, can destabilize the optimization process. Recent works mainly focus on developing and understanding approaches to address heavy-tailed noise in the centralized…

Machine Learning · Computer Science 2026-02-23 Junfei Sun , Dixi Yao , Xuchen Gong , Tahseen Rabbani , Manzil Zaheer , Tian Li

SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is…

Optimization and Control · Mathematics 2018-07-10 Lam M. Nguyen , Phuong Ha Nguyen , Marten van Dijk , Peter Richtárik , Katya Scheinberg , Martin Takáč

HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require…

Optimization and Control · Mathematics 2011-11-14 Feng Niu , Benjamin Recht , Christopher Re , Stephen J. Wright

Parallel Stochastic Gradient Descent with Sound Combiners

Stochastic gradient descent (SGD) is a well known method for regression and classification tasks. However, it is an inherently sequential algorithm at each step, the processing of the current example depends on the parameters learned from…

Machine Learning · Computer Science 2017-05-24 Saeed Maleki , Madanlal Musuvathi , Todd Mytkowicz

Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

In this paper, we propose a unified view of gradient-based algorithms for stochastic convex composite optimization by extending the concept of estimate sequence introduced by Nesterov. More precisely, we interpret a large class of…

Machine Learning · Statistics 2020-09-07 Andrei Kulunchakov , Julien Mairal

Structured and Fast Optimization: The Kronecker SGD Algorithm

Stochastic gradient descent (SGD) now acts as a fundamental part of optimization in current machine learning. Meanwhile, deep learning architectures have shown outstanding performance in a wide range of fields, such as natural language…

Machine Learning · Computer Science 2026-01-27 Zhao Song , Song Yue

Convergence Analysis of Stochastic Accelerated Gradient Methods for Generalized Smooth Optimizations

We investigate the Randomized Stochastic Accelerated Gradient (RSAG) method, utilizing either constant or adaptive step sizes, for stochastic optimization problems with generalized smooth objective functions. Under relaxed affine variance…

Optimization and Control · Mathematics 2025-02-25 Chenhao Yu , Yusu Hong , Junhong Lin

Stochastic Gradient Langevin with Delayed Gradients

Stochastic Gradient Langevin Dynamics (SGLD) ensures strong guarantees with regards to convergence in measure for sampling log-concave posterior distributions by adding noise to stochastic gradient iterates. Given the size of many practical…

Machine Learning · Computer Science 2020-06-15 Vyacheslav Kungurtsev , Bapi Chatterjee , Dan Alistarh

A Fast Algorithm for Separated Sparsity via Perturbed Lagrangians

Sparsity-based methods are widely used in machine learning, statistics, and signal processing. There is now a rich class of structured sparsity approaches that expand the modeling power of the sparsity paradigm and incorporate constraints…

Data Structures and Algorithms · Computer Science 2017-12-22 Aleksander Mądry , Slobodan Mitrović , Ludwig Schmidt

A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization

In this paper, we proposed a new technique, {\em variance controlled stochastic gradient} (VCSG), to improve the performance of the stochastic variance reduced gradient (SVRG) algorithm. To avoid over-reducing the variance of gradient by…

Machine Learning · Computer Science 2021-02-22 Jia Bi , Steve R. Gunn

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on non-convex optimization. Recent studies have shown that the asynchronous stochastic…

Machine Learning · Computer Science 2016-12-21 Zhouyuan Huo , Heng Huang

Stochastic Variance-Reduced Iterative Hard Thresholding in Graph Sparsity Optimization

Stochastic optimization algorithms are widely used for large-scale data analysis due to their low per-iteration costs, but they often suffer from slow asymptotic convergence caused by inherent variance. Variance-reduced techniques have been…

Machine Learning · Statistics 2024-07-25 Derek Fox , Samuel Hernandez , Qianqian Tong

Asynchronous stochastic convex optimization

We show that asymptotically, completely asynchronous stochastic gradient procedures achieve optimal (even to constant factors) convergence rates for the solution of convex optimization problems under nearly the same conditions required for…

Optimization and Control · Mathematics 2015-08-05 John C. Duchi , Sorathan Chaturapruek , Christopher Ré

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet,…

Optimization and Control · Mathematics 2017-11-07 Fabian Pedregosa , Rémi Leblond , Simon Lacoste-Julien

Variance-Reduced Decentralized Stochastic Optimization with Accelerated Convergence

This paper describes a novel algorithmic framework to minimize a finite-sum of functions available over a network of nodes. The proposed framework, that we call~\GTVR, is stochastic and decentralized, and thus is particularly suitable for…

Optimization and Control · Mathematics 2020-12-02 Ran Xin , Usman A. Khan , Soummya Kar