English
Related papers

Related papers: On the Batch Size Selection in Stochastic Gradient…

200 papers

In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the…

Optimization and Control · Mathematics 2017-11-01 Raghu Bollapragada , Richard Byrd , Jorge Nocedal

In this paper, we propose a novel approach to automatically determine the batch size in stochastic gradient descent methods. The choice of the batch size induces a trade-off between the accuracy of the gradient estimate and the cost in…

Machine Learning · Computer Science 2017-12-12 Matteo Pirotta , Marcello Restelli

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it…

Machine Learning · Computer Science 2017-04-10 Soham De , Abhay Yadav , David Jacobs , Tom Goldstein

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple…

Machine Learning · Computer Science 2017-06-29 Lukas Balles , Javier Romero , Philipp Hennig

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement. In practice, however, sampling \emph{without} replacement is very common, easier to…

Machine Learning · Computer Science 2016-10-18 Ohad Shamir

Stochastic alternating direction method of multipliers (SADMM) is a popular method for solving nonconvex nonsmooth optimization in various applications. However, it typically requires an empirical selection of the static batch size for…

Optimization and Control · Mathematics 2026-01-23 Jiachen Jin , Kangkang Deng , Boyu Wang , Hongxia Wang

Minibatch decomposition methods for empirical risk minimization are commonly analysed in a stochastic approximation setting, also known as sampling with replacement. On the other hands modern implementations of such techniques are…

Machine Learning · Computer Science 2023-01-09 Edouard Pauwels

In this paper we study a family of variance reduction methods with randomized batch size---at each step, the algorithm first randomly chooses the batch size and then selects a batch of samples to conduct a variance-reduced stochastic…

Machine Learning · Computer Science 2018-08-08 Xuanqing Liu , Cho-Jui Hsieh

We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters. Stochastic gradient descent is well-known to have this property at small batch sizes, via the…

Machine Learning · Computer Science 2023-03-28 Jacob Hilton , Karl Cobbe , John Schulman

We study how the batch size affects the total gradient variance in differentially private stochastic gradient descent (DP-SGD), seeking a theoretical explanation for the usefulness of large batch sizes. As DP-SGD is the basis of modern DP…

Machine Learning · Statistics 2024-09-20 Ossi Räisä , Joonas Jälkö , Antti Honkela

Increasing the mini-batch size for stochastic gradient descent offers significant opportunities to reduce wall-clock training time, but there are a variety of theoretical and systems challenges that impede the widespread success of this…

Recent advances in the theoretical understanding of SGD led to a formula for the optimal batch size minimizing the number of effective data passes, i.e., the number of iterations times the batch size. However, this formula is of no…

Machine Learning · Computer Science 2021-11-22 Motasem Alfarra , Slavomir Hanzely , Alyazeed Albasyoni , Bernard Ghanem , Peter Richtarik

(Mini-batch) Stochastic Gradient Descent is a popular optimization method which has been applied to many machine learning applications. But a rather high variance introduced by the stochastic gradient in each step may slow down the…

Machine Learning · Computer Science 2018-10-09 Jingchang Liu , Linli Xu

In this paper, we propose a stochastic search algorithm for solving general optimization problems with little structure. The algorithm iteratively finds high quality solutions by randomly sampling candidate solutions from a parameterized…

Optimization and Control · Mathematics 2013-01-08 Enlu Zhou , Jiaqiao Hu

Variance-reduced algorithms, although achieve great theoretical performance, can run slowly in practice due to the periodic gradient estimation with a large batch of data. Batch-size adaptation thus arises as a promising approach to…

Optimization and Control · Mathematics 2020-07-28 Kaiyi Ji , Zhe Wang , Bowen Weng , Yi Zhou , Wei Zhang , Yingbin Liang

In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works…

Machine Learning · Computer Science 2019-01-30 Bicheng Ying , Kun Yuan , Stefan Vlaski , Ali H. Sayed

We study the role of batch size in stochastic conditional gradient methods under a $\mu$-Kurdyka-{\L}ojasiewicz ($\mu$-KL) condition. Focusing on momentum-based stochastic conditional gradient algorithms (e.g., Scion), we derive a new…

Machine Learning · Computer Science 2026-03-24 Rustem Islamov , Roman Machacek , Aurelien Lucchi , Antonio Silveti-Falls , Eduard Gorbunov , Volkan Cevher

Bilevel Optimization has experienced significant advancements recently with the introduction of new efficient algorithms. Mirroring the success in single-level optimization, stochastic gradient-based algorithms are widely used in bilevel…

Optimization and Control · Mathematics 2024-11-12 Junyi Li , Heng Huang

We analyze a batched variant of Stochastic Gradient Descent (SGD) with weighted sampling distribution for smooth and non-smooth objective functions. We show that by distributing the batches computationally, a significant speedup in the…

Numerical Analysis · Mathematics 2017-03-02 Deanna Needell , Rachel Ward

Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function's gradient with a small number of training examples, aka the batch size. Small batch sizes require little computation for each model update…

Machine Learning · Computer Science 2023-09-28 Scott Sievert , Shrey Shah
‹ Prev 1 2 3 10 Next ›