English
Related papers

Related papers: Dynamic Batch Adaptation

200 papers

The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants…

Machine Learning · Computer Science 2025-09-22 Yuen Chen , Yian Wang , Hari Sundaram

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple…

Machine Learning · Computer Science 2017-06-29 Lukas Balles , Javier Romero , Philipp Hennig

Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer…

Machine Learning · Computer Science 2018-02-15 Aditya Devarakonda , Maxim Naumov , Michael Garland

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is…

Signal Processing · Electrical Eng. & Systems 2020-07-10 Zhan Gao , Alec Koppel , Alejandro Ribeiro

We propose a first-order stochastic optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. The adaptive regularization is imposed by stochastic process in determining…

Machine Learning · Computer Science 2020-04-15 Kensuke Nakamura , Stefano Soatto , Byung-Woo Hong

In this paper, we propose a generalization of the Batch Normalization (BN) algorithm, diminishing batch normalization (DBN), where we update the BN parameters in a diminishing moving average way. BN is very effective in accelerating the…

Machine Learning · Computer Science 2019-02-20 Yintai Ma , Diego Klabjan

Deep neural networks (DNN) have shown unprecedented success in various computer vision applications such as image classification and object detection. However, it is still a common annoyance during the training phase, that one has to…

Computer Vision and Pattern Recognition · Computer Science 2016-11-09 Yanghao Li , Naiyan Wang , Jianping Shi , Jiaying Liu , Xiaodi Hou

We propose a stochastic optimization method for minimizing loss functions, expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such…

Machine Learning · Computer Science 2020-03-04 Achraf Bahamou , Donald Goldfarb

Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers…

Machine Learning · Computer Science 2025-02-03 Ouya Wang , Shenglong Zhou , Geoffrey Ye Li

Adaptive gradient methods have been increasingly adopted by deep learning community due to their fast convergence and reduced sensitivity to hyper-parameters. However, these methods come with limitations, such as increased memory…

Machine Learning · Computer Science 2024-12-17 Corrado Coppola , Lorenzo Papa , Irene Amerini , Laura Palagi

Data augmentation have been intensively used in training deep neural network to improve the generalization, whether in original space (e.g., image space) or representation space. Although being successful, the connection between the…

Machine Learning · Computer Science 2020-06-30 Qian Li , Qingyuan Hu , Yong Qi , Saiyu Qi , Jie Ma , Jian Zhang

Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based…

Machine Learning · Computer Science 2025-02-12 Son Nguyen , Bo Liu , Lizhang Chen , Qiang Liu

Large language models (LLMs) fine-tuning shows excellent implications. However, vanilla fine-tuning methods often require intricate data mixture and repeated experiments for optimal generalization. To address these challenges and streamline…

Computation and Language · Computer Science 2025-10-20 Yang Tang , Ruijie Liu , Yifan Wang , Shiyu Li , Xi Chen

The composition of training data mixtures is critical for effectively training large language models (LLMs), as it directly impacts their performance on downstream tasks. Our goal is to identify an optimal data mixture to specialize an LLM…

Machine Learning · Computer Science 2024-10-04 Simin Fan , David Grangier , Pierre Ablin

This paper presents a novel adaptation of the Stochastic Gradient Descent (SGD), termed AdaBatchGrad. This modification seamlessly integrates an adaptive step size with an adjustable batch size. An increase in batch size and a decrease in…

Machine Learning · Computer Science 2024-02-09 Petr Ostroukhov , Aigerim Zhumabayeva , Chulu Xiang , Alexander Gasnikov , Martin Takáč , Dmitry Kamzolov

The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the…

Machine Learning · Computer Science 2023-10-18 Zhao Song , Chiwun Yang

Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances…

Machine Learning · Computer Science 2019-01-29 Elad Hoffer , Tal Ben-Nun , Itay Hubara , Niv Giladi , Torsten Hoefler , Daniel Soudry

The training of deep neural networks is inherently a nonconvex optimization problem, yet standard approaches such as stochastic gradient descent (SGD) require simultaneous updates to all parameters, often leading to unstable convergence and…

Machine Learning · Computer Science 2025-08-07 Chengcheng Yan , Jiawei Xu , Zheng Peng , Qingsong Wang

The choice of hyper-parameters affects the performance of neural models. While much previous research (Sutskever et al., 2013; Duchi et al., 2011; Kingma and Ba, 2015) focuses on accelerating convergence and reducing the effects of the…

Computation and Language · Computer Science 2020-05-06 Hongfei Xu , Josef van Genabith , Deyi Xiong , Qiuhui Liu

Modern deep neural network training is typically based on mini-batch stochastic gradient optimization. While the use of large mini-batches increases the available computational parallelism, small batch training has been shown to provide…

Machine Learning · Computer Science 2018-04-23 Dominic Masters , Carlo Luschi
‹ Prev 1 2 3 10 Next ›