English
Related papers

Related papers: Small batch deep reinforcement learning

200 papers

Modern deep neural network training is typically based on mini-batch stochastic gradient optimization. While the use of large mini-batches increases the available computational parallelism, small batch training has been shown to provide…

Machine Learning · Computer Science 2018-04-23 Dominic Masters , Carlo Luschi

We study the role of an essential hyper-parameter that governs the training of Transformers for neural machine translation in a low-resource setting: the batch size. Using theoretical insights and experimental evidence, we argue against the…

Computation and Language · Computer Science 2022-03-22 Àlex R. Atrio , Andrei Popescu-Belis

Stochastic gradient descent with momentum (SGDM), in which a momentum term is added to SGD, has been well studied in both theory and practice. The theoretical studies show that the settings of the learning rate and momentum weight affect…

Machine Learning · Computer Science 2025-09-25 Keisuke Kamo , Hideaki Iiduka

In an increasing number of domains it has been demonstrated that deep learning models can be trained using relatively large batch sizes without sacrificing data efficiency. However the limits of this massive data parallelism seem to differ…

Machine Learning · Computer Science 2018-12-18 Sam McCandlish , Jared Kaplan , Dario Amodei , OpenAI Dota Team

Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer…

Machine Learning · Computer Science 2018-02-15 Aditya Devarakonda , Maxim Naumov , Michael Garland

Recent deep learning models are difficult to train using a large batch size, because commodity machines may not have enough memory to accommodate both the model and a large data batch size. The batch size is one of the hyper-parameters used…

Machine Learning · Computer Science 2024-07-03 XinYu Piao , DoangJoo Synn , JooYoung Park , Jong-Kook Kim

Increasing the mini-batch size for stochastic gradient descent offers significant opportunities to reduce wall-clock training time, but there are a variety of theoretical and systems challenges that impede the widespread success of this…

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple…

Machine Learning · Computer Science 2017-06-29 Lukas Balles , Javier Romero , Philipp Hennig

The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants…

Machine Learning · Computer Science 2025-09-22 Yuen Chen , Yian Wang , Hari Sundaram

Large-batch training is an efficient approach for current distributed deep learning systems. It has enabled researchers to reduce the ImageNet/ResNet-50 training from 29 hours to around 1 minute. In this paper, we focus on studying the…

Machine Learning · Computer Science 2020-06-16 Yang You , Yuhui Wang , Huan Zhang , Zhao Zhang , James Demmel , Cho-Jui Hsieh

Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances…

Machine Learning · Computer Science 2019-01-29 Elad Hoffer , Tal Ben-Nun , Itay Hubara , Niv Giladi , Torsten Hoefler , Daniel Soudry

Recent hardware developments have dramatically increased the scale of data parallelism available for neural network training. Among the simplest ways to harness next-generation hardware is to increase the batch size in standard mini-batch…

Machine Learning · Computer Science 2019-07-22 Christopher J. Shallue , Jaehoon Lee , Joseph Antognini , Jascha Sohl-Dickstein , Roy Frostig , George E. Dahl

The batch size is an essential parameter to tune during the development of new neural networks. Amongst other quality indicators, it has a large degree of influence on the model's accuracy, generalisability, training times and…

Machine Learning · Computer Science 2023-07-24 Tim Yarally , Luís Cruz , Daniel Feitosa , June Sallou , Arie van Deursen

The use of mini-batches of data in training artificial neural networks is nowadays very common. Despite its broad usage, theories explaining quantitatively how large or small the optimal mini-batch size should be are missing. This work…

Disordered Systems and Neural Networks · Physics 2024-01-17 Raffaele Marino , Federico Ricci-Tersenghi

We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters. Stochastic gradient descent is well-known to have this property at small batch sizes, via the…

Machine Learning · Computer Science 2023-03-28 Jacob Hilton , Karl Cobbe , John Schulman

Production scheduling is an essential task in manufacturing, with Reinforcement Learning (RL) emerging as a key solution. In a previous work, RL was utilized to solve an extended permutation flow shop scheduling problem (PFSSP) for a…

Machine Learning · Computer Science 2024-06-05 Arthur Müller , Felix Grumbach , Matthia Sabatelli

Foundation models in speech are often trained using many GPUs, which implicitly leads to large effective batch sizes. In this paper we study the effect of batch size on pre-training, both in terms of statistics that can be monitored during…

Sound · Computer Science 2024-02-22 Nik Vaessen , David A. van Leeuwen

It is held as a truism that deep neural networks require large datasets to train effective models. However, large datasets, especially with high-quality labels, can be expensive to obtain. This study sets out to investigate (i) how large a…

Information Retrieval · Computer Science 2019-01-31 Trond Linjordet , Krisztian Balog

When using active learning, smaller batch sizes are typically more efficient from a learning efficiency perspective. However, in practice due to speed and human annotator considerations, the use of larger batch sizes is necessary. While…

Machine Learning · Computer Science 2018-05-18 Garrett Beatty , Ethan Kochis , Michael Bloodgood

We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining…

Machine Learning · Computer Science 2020-04-15 Kensuke Nakamura , Stefano Soatto , Byung-Woo Hong
‹ Prev 1 2 3 10 Next ›