Related papers: Dynamic Batch Adaptation

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation

The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants…

Machine Learning · Computer Science 2025-09-22 Yuen Chen , Yian Wang , Hari Sundaram

Coupling Adaptive Batch Sizes with Learning Rates

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple…

Machine Learning · Computer Science 2017-06-29 Lukas Balles , Javier Romero , Philipp Hennig

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer…

Machine Learning · Computer Science 2018-02-15 Aditya Devarakonda , Maxim Naumov , Michael Garland

Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is…

Signal Processing · Electrical Eng. & Systems 2020-07-10 Zhan Gao , Alec Koppel , Alejandro Ribeiro

Stochastic batch size for adaptive regularization in deep network optimization

We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining…

Machine Learning · Computer Science 2020-04-15 Kensuke Nakamura , Stefano Soatto , Byung-Woo Hong

Diminishing Batch Normalization

In this paper, we propose a generalization of the Batch Normalization (BN) algorithm, diminishing batch normalization (DBN), where we update the BN parameters in a diminishing moving average way. BN is very effective in accelerating the…

Machine Learning · Computer Science 2019-02-20 Yintai Ma , Diego Klabjan

Revisiting Batch Normalization For Practical Domain Adaptation

Deep neural networks (DNN) have shown unprecedented success in various computer vision applications such as image classification and object detection. However, it is still a common annoyance during the training phase, that one has to…

Computer Vision and Pattern Recognition · Computer Science 2016-11-09 Yanghao Li , Naiyan Wang , Jianping Shi , Jiaying Liu , Xiaodi Hou

A Dynamic Sampling Adaptive-SGD Method for Machine Learning

We propose a stochastic optimization method for minimizing loss functions, expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such…

Machine Learning · Computer Science 2020-03-04 Achraf Bahamou , Donald Goldfarb

BADM: Batch ADMM for Deep Learning

Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers…

Machine Learning · Computer Science 2025-02-03 Ouya Wang , Shenglong Zhou , Geoffrey Ye Li

Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization

Adaptive gradient methods have been increasingly adopted by deep learning community due to their fast convergence and reduced sensitivity to hyper-parameters. However, these methods come with limitations, such as increased memory…

Machine Learning · Computer Science 2024-12-17 Corrado Coppola , Lorenzo Papa , Irene Amerini , Laura Palagi

Stochastic Batch Augmentation with An Effective Distilled Dynamic Soft Label Regularizer

Data augmentation have been intensively used in training deep neural network to improve the generalization, whether in original space (e.g., image space) or representation space. Although being successful, the connection between the…

Machine Learning · Computer Science 2020-06-30 Qian Li , Qingyuan Hu , Yong Qi , Saiyu Qi , Jie Ma , Jian Zhang

Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based…

Machine Learning · Computer Science 2025-02-12 Son Nguyen , Bo Liu , Lizhang Chen , Qiang Liu

Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing

Large language models (LLMs) fine-tuning shows excellent implications. However, vanilla fine-tuning methods often require intricate data mixture and repeated experiments for optimal generalization. To address these challenges and streamline…

Computation and Language · Computer Science 2025-10-20 Yang Tang , Ruijie Liu , Yifan Wang , Shiyu Li , Xi Chen

Dynamic Gradient Alignment for Online Data Mixing

The composition of training data mixtures is critical for effectively training large language models (LLMs), as it directly impacts their performance on downstream tasks. Our goal is to identify an optimal data mixture to specialize an LLM…

Machine Learning · Computer Science 2024-10-04 Simin Fan , David Grangier , Pierre Ablin

AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size

This paper presents a novel adaptation of the Stochastic Gradient Descent (SGD), termed AdaBatchGrad. This modification seamlessly integrates an adaptive step size with an adjustable batch size. An increase in batch size and a decrease in…

Machine Learning · Computer Science 2024-02-09 Petr Ostroukhov , Aigerim Zhumabayeva , Chulu Xiang , Alexander Gasnikov , Martin Takáč , Dmitry Kamzolov

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the…

Machine Learning · Computer Science 2023-10-18 Zhao Song , Chiwun Yang

Augment your batch: better training with larger batches

Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances…

Machine Learning · Computer Science 2019-01-29 Elad Hoffer , Tal Ben-Nun , Itay Hubara , Niv Giladi , Torsten Hoefler , Daniel Soudry

Neural Network Training via Stochastic Alternating Minimization with Trainable Step Sizes

The training of deep neural networks is inherently a nonconvex optimization problem, yet standard approaches such as stochastic gradient descent (SGD) require simultaneous updates to all parameters, often leading to unstable convergence and…

Machine Learning · Computer Science 2025-08-07 Chengcheng Yan , Jiawei Xu , Zheng Peng , Qingsong Wang

Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

The choice of hyper-parameters affects the performance of neural models. While much previous research (Sutskever et al., 2013; Duchi et al., 2011; Kingma and Ba, 2015) focuses on accelerating convergence and reducing the effects of the…

Computation and Language · Computer Science 2020-05-06 Hongfei Xu , Josef van Genabith , Deyi Xiong , Qiuhui Liu

Revisiting Small Batch Training for Deep Neural Networks

Modern deep neural network training is typically based on mini-batch stochastic gradient optimization. While the use of large mini-batches increases the available computational parallelism, small batch training has been shown to provide…

Machine Learning · Computer Science 2018-04-23 Dominic Masters , Carlo Luschi