Related papers: Accelerating Minibatch Stochastic Gradient Descent…

Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling

Machine learning, especially deep neural networks, has been rapidly developed in fields including computer vision, speech recognition and reinforcement learning. Although Mini-batch SGD is one of the most popular stochastic optimization…

Machine Learning · Computer Science 2019-03-12 Xinyu Peng , Li Li , Fei-Yue Wang

Accelerating Stochastic Gradient Descent Using Antithetic Sampling

(Mini-batch) Stochastic Gradient Descent is a popular optimization method which has been applied to many machine learning applications. But a rather high variance introduced by the stochastic gradient in each step may slow down the…

Machine Learning · Computer Science 2018-10-09 Jingchang Liu , Linli Xu

Batched Stochastic Gradient Descent with Weighted Sampling

We analyze a batched variant of Stochastic Gradient Descent (SGD) with weighted sampling distribution for smooth and non-smooth objective functions. We show that by distributing the batches computationally, a significant speedup in the…

Numerical Analysis · Mathematics 2017-03-02 Deanna Needell , Rachel Ward

Online Learning to Sample

Stochastic Gradient Descent (SGD) is one of the most widely used techniques for online optimization in machine learning. In this work, we accelerate SGD by adaptively learning how to sample the most useful training examples at each time…

Machine Learning · Computer Science 2016-03-16 Guillaume Bouchard , Théo Trouillon , Julien Perez , Adrien Gaidon

Distributed Stochastic Optimization via Adaptive SGD

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

Don't Use Large Mini-Batches, Use Local SGD

Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep neural networks. Drastic increases in the mini-batch sizes have lead to key efficiency and scalability gains in recent years. However,…

Machine Learning · Computer Science 2020-02-18 Tao Lin , Sebastian U. Stich , Kumar Kshitij Patel , Martin Jaggi

Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training

Stochastic gradient descent~(SGD) and its variants have been the dominating optimization methods in machine learning. Compared to SGD with small-batch training, SGD with large-batch training can better utilize the computational power of…

Machine Learning · Statistics 2024-04-16 Shen-Yi Zhao , Chang-Wei Shi , Yin-Peng Xie , Wu-Jun Li

Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization

The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing…

Machine Learning · Computer Science 2025-10-14 Yunwen Lei , Tao Sun , Mingrui Liu

Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization

Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning models over big data. A stochastic gradient is typically calculated from a limited number of samples (known as mini-batch), so it…

Machine Learning · Computer Science 2016-01-14 Yadong Mu , Wei Liu , Wei Fan

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function…

Machine Learning · Computer Science 2016-04-20 Jakub Konečný , Jie Liu , Peter Richtárik , Martin Takáč

Stochastic Gradient Descent Meets Distribution Regression

Stochastic gradient descent (SGD) provides a simple and efficient way to solve a broad range of machine learning problems. Here, we focus on distribution regression (DR), involving two stages of sampling: Firstly, we regress from…

Machine Learning · Statistics 2021-03-08 Nicole Mücke

Stochastic Multiple Target Sampling Gradient Descent

Sampling from an unnormalized target distribution is an essential problem with many applications in probabilistic inference. Stein Variational Gradient Descent (SVGD) has been shown to be a powerful method that iteratively updates a set of…

Machine Learning · Computer Science 2023-02-13 Hoang Phan , Ngoc Tran , Trung Le , Toan Tran , Nhat Ho , Dinh Phung

Variance Reduced Stochastic Gradient Descent with Neighbors

Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness,…

Machine Learning · Computer Science 2016-02-29 Thomas Hofmann , Aurelien Lucchi , Simon Lacoste-Julien , Brian McWilliams

On the diffusion approximation of nonconvex stochastic gradient descent

We study the Stochastic Gradient Descent (SGD) method in nonconvex optimization problems from the point of view of approximating diffusion processes. We prove rigorously that the diffusion process can approximate the SGD algorithm weakly…

Machine Learning · Statistics 2018-03-06 Wenqing Hu , Chris Junchi Li , Lei Li , Jian-Guo Liu

Gradient Diversity: a Key Ingredient for Scalable Distributed Learning

It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size. In this work, we…

Machine Learning · Computer Science 2018-01-09 Dong Yin , Ashwin Pananjady , Max Lam , Dimitris Papailiopoulos , Kannan Ramchandran , Peter Bartlett

mS2GD: Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

We propose a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent applied to the problem of minimizing a strongly convex composite function represented as the sum of an…

Machine Learning · Computer Science 2014-10-20 Jakub Konečný , Jie Liu , Peter Richtárik , Martin Takáč

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say $32$-$512$ data points, is…

Machine Learning · Computer Science 2017-02-13 Nitish Shirish Keskar , Dheevatsa Mudigere , Jorge Nocedal , Mikhail Smelyanskiy , Ping Tak Peter Tang

Efficient Distributed SGD with Variance Reduction

Stochastic Gradient Descent (SGD) has become one of the most popular optimization methods for training machine learning models on massive datasets. However, SGD suffers from two main drawbacks: (i) The noisy gradient updates have high…

Machine Learning · Computer Science 2017-04-10 Soham De , Tom Goldstein

Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-06 Janis Keuper , Franz-Josef Pfreundt