Related papers: Hyper-Learning for Gradient-Based Batch Size Adapt…

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation

The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants…

Machine Learning · Computer Science 2025-09-22 Yuen Chen , Yian Wang , Hari Sundaram

Automated Learning Rate Scheduler for Large-batch Training

Large-batch training has been essential in leveraging large-scale datasets and models in deep learning. While it is computationally beneficial to use large batch sizes, it often requires a specially designed learning rate (LR) schedule to…

Machine Learning · Computer Science 2021-07-14 Chiheon Kim , Saehoon Kim , Jongmin Kim , Donghoon Lee , Sungwoong Kim

Gradient Descent with Provably Tuned Learning-rate Schedules

Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches…

Machine Learning · Computer Science 2025-12-05 Dravyansh Sharma

Stochastic batch size for adaptive regularization in deep network optimization

We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining…

Machine Learning · Computer Science 2020-04-15 Kensuke Nakamura , Stefano Soatto , Byung-Woo Hong

Coupling Adaptive Batch Sizes with Learning Rates

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple…

Machine Learning · Computer Science 2017-06-29 Lukas Balles , Javier Romero , Philipp Hennig

Differentiable Self-Adaptive Learning Rate

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

Selection Hyper-heuristics Can Automatically Adjust the Learning Period to Optimally Solve Pseudo-Boolean Problems

The Random Gradient hyper-heuristic was recently shown to be able to learn the optimal neighbourhood size when optimizing the LeadingOnes benchmark via the Randomised Local Search (RLS) meta-heuristic. However, for this to happen, a…

Neural and Evolutionary Computing · Computer Science 2026-05-29 Benjamin Doerr , Pietro S. Oliveto , John Alasdair Warwicker

Submodular Batch Selection for Training Deep Neural Networks

Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation…

Machine Learning · Computer Science 2019-06-21 K J Joseph , Vamshi Teja R , Krishnakant Singh , Vineeth N Balasubramanian

Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach

Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed -- larger batches are more…

Machine Learning · Computer Science 2024-10-15 Masaki Adachi , Satoshi Hayakawa , Martin Jørgensen , Xingchen Wan , Vu Nguyen , Harald Oberhauser , Michael A. Osborne

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer…

Machine Learning · Computer Science 2018-02-15 Aditya Devarakonda , Maxim Naumov , Michael Garland

Increasing Both Batch Size and Learning Rate Accelerates Stochastic Gradient Descent

The performance of mini-batch stochastic gradient descent (SGD) strongly depends on setting the batch size and learning rate to minimize the empirical loss in training the deep neural network. In this paper, we present theoretical analyses…

Machine Learning · Computer Science 2025-02-17 Hikaru Umeda , Hideaki Iiduka

Augment your batch: better training with larger batches

Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances…

Machine Learning · Computer Science 2019-01-29 Elad Hoffer , Tal Ben-Nun , Itay Hubara , Niv Giladi , Torsten Hoefler , Daniel Soudry

Simple and Effective Gradient-Based Tuning of Sequence-to-Sequence Models

Recent trends towards training ever-larger language models have substantially improved machine learning performance across linguistic tasks. However, the huge cost of training larger models can make tuning them prohibitively expensive,…

Computation and Language · Computer Science 2022-09-13 Jared Lichtarge , Chris Alberti , Shankar Kumar

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it…

Machine Learning · Computer Science 2017-04-10 Soham De , Abhay Yadav , David Jacobs , Tom Goldstein

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

The choice of batch-size in a stochastic optimization algorithm plays a substantial role for both optimization and generalization. Increasing the batch-size used typically improves optimization but degrades generalization. To address the…

Machine Learning · Computer Science 2020-03-03 Yeming Wen , Kevin Luk , Maxime Gazeau , Guodong Zhang , Harris Chan , Jimmy Ba

Large batch size training of neural networks with adversarial training and second-order information

The most straightforward method to accelerate Stochastic Gradient Descent (SGD) computation is to distribute the randomly selected batch of inputs over multiple processors. To keep the distributed processors fully utilized requires…

Machine Learning · Computer Science 2020-01-06 Zhewei Yao , Amir Gholami , Daiyaan Arfeen , Richard Liaw , Joseph Gonzalez , Kurt Keutzer , Michael Mahoney

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns. In this work, we study how the critical batch size changes based on…

Machine Learning · Computer Science 2019-10-29 Guodong Zhang , Lala Li , Zachary Nado , James Martens , Sushant Sachdeva , George E. Dahl , Christopher J. Shallue , Roger Grosse

Dynamic Batch Adaptation

Current deep learning adaptive optimizer methods adjust the step magnitude of parameter updates by altering the effective learning rate used by each parameter. Motivated by the known inverse relation between batch size and learning rate on…

Machine Learning · Computer Science 2022-08-02 Cristian Simionescu , George Stoica , Robert Herscovici

Massive Dimensions Reduction and Hybridization with Meta-heuristics in Deep Learning

Deep learning is mainly based on utilizing gradient-based optimization for training Deep Neural Network (DNN) models. Although robust and widely used, gradient-based optimization algorithms are prone to getting stuck in local minima. In…

Neural and Evolutionary Computing · Computer Science 2024-08-15 Rasa Khosrowshahli , Shahryar Rahnamayan , Beatrice Ombuki-Berman

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the…

Machine Learning · Computer Science 2023-10-18 Zhao Song , Chiwun Yang