Related papers: On the Batch Size Selection in Stochastic Gradient…

Adaptive Sampling Strategies for Stochastic Optimization

In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the…

Optimization and Control · Mathematics 2017-11-01 Raghu Bollapragada , Richard Byrd , Jorge Nocedal

Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent

In this paper, we propose a novel approach to automatically determine the batch size in stochastic gradient descent methods. The choice of the batch size induces a trade-off between the accuracy of the gradient estimate and the cost in…

Machine Learning · Computer Science 2017-12-12 Matteo Pirotta , Marcello Restelli

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it…

Machine Learning · Computer Science 2017-04-10 Soham De , Abhay Yadav , David Jacobs , Tom Goldstein

Coupling Adaptive Batch Sizes with Learning Rates

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple…

Machine Learning · Computer Science 2017-06-29 Lukas Balles , Javier Romero , Philipp Hennig

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement. In practice, however, sampling \emph{without} replacement is very common, easier to…

Machine Learning · Computer Science 2016-10-18 Ohad Shamir

Stochastic ADMM with batch size adaptation for nonconvex nonsmooth optimization

Stochastic alternating direction method of multipliers (SADMM) is a popular method for solving nonconvex nonsmooth optimization in various applications. However, it typically requires an empirical selection of the static batch size for…

Optimization and Control · Mathematics 2026-01-23 Jiachen Jin , Kangkang Deng , Boyu Wang , Hongxia Wang

Incremental Without Replacement Sampling in Nonconvex Optimization

Minibatch decomposition methods for empirical risk minimization are commonly analysed in a stochastic approximation setting, also known as sampling with replacement. On the other hands modern implementations of such techniques are…

Machine Learning · Computer Science 2023-01-09 Edouard Pauwels

Fast Variance Reduction Method with Stochastic Batch Size

In this paper we study a family of variance reduction methods with randomized batch size---at each step, the algorithm first randomly chooses the batch size and then selects a batch of samples to conduct a variance-reduced stochastic…

Machine Learning · Computer Science 2018-08-08 Xuanqing Liu , Cho-Jui Hsieh

Batch size-invariance for policy optimization

We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters. Stochastic gradient descent is well-known to have this property at small batch sizes, via the…

Machine Learning · Computer Science 2023-03-28 Jacob Hilton , Karl Cobbe , John Schulman

Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation

We study how the batch size affects the total gradient variance in differentially private stochastic gradient descent (DP-SGD), seeking a theoretical explanation for the usefulness of large batch sizes. As DP-SGD is the basis of modern DP…

Machine Learning · Statistics 2024-09-20 Ossi Räisä , Joonas Jälkö , Antti Honkela

On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent

Increasing the mini-batch size for stochastic gradient descent offers significant opportunities to reduce wall-clock training time, but there are a variety of theoretical and systems challenges that impede the widespread success of this…

Machine Learning · Computer Science 2018-12-03 Noah Golmant , Nikita Vemuri , Zhewei Yao , Vladimir Feinberg , Amir Gholami , Kai Rothauge , Michael W. Mahoney , Joseph Gonzalez

Adaptive Learning of the Optimal Batch Size of SGD

Recent advances in the theoretical understanding of SGD led to a formula for the optimal batch size minimizing the number of effective data passes, i.e., the number of iterations times the batch size. However, this formula is of no…

Machine Learning · Computer Science 2021-11-22 Motasem Alfarra , Slavomir Hanzely , Alyazeed Albasyoni , Bernard Ghanem , Peter Richtarik

Accelerating Stochastic Gradient Descent Using Antithetic Sampling

(Mini-batch) Stochastic Gradient Descent is a popular optimization method which has been applied to many machine learning applications. But a rather high variance introduced by the stochastic gradient in each step may slow down the…

Machine Learning · Computer Science 2018-10-09 Jingchang Liu , Linli Xu

Gradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization

In this paper, we propose a stochastic search algorithm for solving general optimization problems with little structure. The algorithm iteratively finds high quality solutions by randomly sampling candidate solutions from a parameterized…

Optimization and Control · Mathematics 2013-01-08 Enlu Zhou , Jiaqiao Hu

History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms

Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to…

Optimization and Control · Mathematics 2020-07-28 Kaiyi Ji , Zhe Wang , Bowen Weng , Yi Zhou , Wei Zhang , Yingbin Liang

Stochastic Learning under Random Reshuffling with Constant Step-sizes

In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works…

Machine Learning · Computer Science 2019-01-30 Bicheng Ying , Kun Yuan , Stefan Vlaski , Ali H. Sayed

On the Role of Batch Size in Stochastic Conditional Gradient Methods

We study the role of batch size in stochastic conditional gradient methods under a $\mu$-Kurdyka-{\L}ojasiewicz ($\mu$-KL) condition. Focusing on momentum-based stochastic conditional gradient algorithms (e.g., Scion), we derive a new…

Machine Learning · Computer Science 2026-03-24 Rustem Islamov , Roman Machacek , Aurelien Lucchi , Antonio Silveti-Falls , Eduard Gorbunov , Volkan Cevher

Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling

Bilevel Optimization has experienced significant advancements recently with the introduction of new efficient algorithms. Mirroring the success in single-level optimization, stochastic gradient-based algorithms are widely used in bilevel…

Optimization and Control · Mathematics 2024-11-12 Junyi Li , Heng Huang

Batched Stochastic Gradient Descent with Weighted Sampling

We analyze a batched variant of Stochastic Gradient Descent (SGD) with weighted sampling distribution for smooth and non-smooth objective functions. We show that by distributing the batches computationally, a significant speedup in the…

Numerical Analysis · Mathematics 2017-03-02 Deanna Needell , Rachel Ward

Improving the convergence of SGD through adaptive batch sizes

Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function's gradient with a small number of training examples, aka the batch size. Small batch sizes require little computation for each model update…

Machine Learning · Computer Science 2023-09-28 Scott Sievert , Shrey Shah