Related papers: AdaBatch: Efficient Gradient Aggregation Rules for…

Better Mini-Batch Algorithms via Accelerated Gradient Methods

Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard…

Machine Learning · Computer Science 2011-06-24 Andrew Cotter , Ohad Shamir , Nathan Srebro , Karthik Sridharan

AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size

This paper presents a novel adaptation of the Stochastic Gradient Descent (SGD), termed AdaBatchGrad. This modification seamlessly integrates an adaptive step size with an adjustable batch size. An increase in batch size and a decrease in…

Machine Learning · Computer Science 2024-02-09 Petr Ostroukhov , Aigerim Zhumabayeva , Chulu Xiang , Alexander Gasnikov , Martin Takáč , Dmitry Kamzolov

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation

The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants…

Machine Learning · Computer Science 2025-09-22 Yuen Chen , Yian Wang , Hari Sundaram

Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling

Stochastic Gradient Descent (SGD) is a popular optimization method which has been applied to many important machine learning tasks such as Support Vector Machines and Deep Neural Networks. In order to parallelize SGD, minibatch training is…

Machine Learning · Statistics 2014-05-14 Peilin Zhao , Tong Zhang

AdaScale SGD: A User-Friendly Algorithm for Distributed Training

When using large-batch training to speed up stochastic gradient descent, learning rates must adapt to new batch sizes in order to maximize speed-ups and preserve model quality. Re-tuning learning rates is resource intensive, while fixed…

Machine Learning · Computer Science 2020-07-13 Tyler B. Johnson , Pulkit Agrawal , Haijie Gu , Carlos Guestrin

Accelerated Stochastic Gradient Descent for Minimizing Finite Sums

We propose an optimization method for minimizing the finite sums of smooth convex functions. Our method incorporates an accelerated gradient descent (AGD) and a stochastic variance reduction gradient (SVRG) in a mini-batch setting. Unlike…

Machine Learning · Statistics 2015-06-11 Atsushi Nitanda

AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods

The choice of batch sizes in minibatch stochastic gradient optimizers is critical in large-scale model training for both optimization and generalization performance. Although large-batch training is arguably the dominant training paradigm…

Machine Learning · Computer Science 2024-05-29 Tim Tsz-Kit Lau , Han Liu , Mladen Kolar

Scaling Distributed Training with Adaptive Summation

Stochastic gradient descent (SGD) is an inherently sequential training algorithm--computing the gradient at batch $i$ depends on the model parameters learned from batch $i-1$. Prior approaches that break this dependence do not honor them…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-05 Saeed Maleki , Madan Musuvathi , Todd Mytkowicz , Olli Saarikivi , Tianju Xu , Vadim Eksarevskiy , Jaliya Ekanayake , Emad Barsoum

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function…

Machine Learning · Computer Science 2016-04-20 Jakub Konečný , Jie Liu , Peter Richtárik , Martin Takáč

mS2GD: Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

We propose a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent applied to the problem of minimizing a strongly convex composite function represented as the sum of an…

Machine Learning · Computer Science 2014-10-20 Jakub Konečný , Jie Liu , Peter Richtárik , Martin Takáč

Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling

Machine learning, especially deep neural networks, has been rapidly developed in fields including computer vision, speech recognition and reinforcement learning. Although Mini-batch SGD is one of the most popular stochastic optimization…

Machine Learning · Computer Science 2019-03-12 Xinyu Peng , Li Li , Fei-Yue Wang

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

In this paper, we develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings. The use of mini-batches is becoming a golden standard in the…

Optimization and Control · Mathematics 2017-09-20 Tomoya Murata , Taiji Suzuki

Universality of AdaGrad Stepsizes for Stochastic Optimization: Inexact Oracle, Acceleration and Variance Reduction

We present adaptive gradient methods (both basic and accelerated) for solving convex composite optimization problems in which the main part is approximately smooth (a.k.a. $(\delta, L)$-smooth) and can be accessed only via a (potentially…

Optimization and Control · Mathematics 2024-06-11 Anton Rodomanov , Xiaowen Jiang , Sebastian Stich

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on…

Machine Learning · Computer Science 2013-03-28 Tom Schaul , Yann LeCun

Variance Reduction with Sparse Gradients

Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients to reduce the variance of stochastic gradients. Compared to SGD, these methods require at least double the number of operations per…

Machine Learning · Computer Science 2020-01-28 Melih Elibol , Lihua Lei , Michael I. Jordan

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

We propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning. The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an…

Machine Learning · Statistics 2020-02-04 Kenji Kawaguchi , Haihao Lu

A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

SGD (Stochastic Gradient Descent) is a popular algorithm for large scale optimization problems due to its low iterative cost. However, SGD can not achieve linear convergence rate as FGD (Full Gradient Descent) because of the inherent…

Machine Learning · Computer Science 2017-12-05 Aixiang Chen , Bingchuan Chen , Xiaolong Chai , Rui Bian , Hengguang Li

Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization

Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning models over big data. A stochastic gradient is typically calculated from a limited number of samples (known as mini-batch), so it…

Machine Learning · Computer Science 2016-01-14 Yadong Mu , Wei Liu , Wei Fan

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it…

Machine Learning · Computer Science 2017-04-10 Soham De , Abhay Yadav , David Jacobs , Tom Goldstein

Gradient Diversity: a Key Ingredient for Scalable Distributed Learning

It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size. In this work, we…

Machine Learning · Computer Science 2018-01-09 Dong Yin , Ashwin Pananjady , Max Lam , Dimitris Papailiopoulos , Kannan Ramchandran , Peter Bartlett