Related papers: Stochastic batch size for adaptive regularization …

Coupling Adaptive Batch Sizes with Learning Rates

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple…

Machine Learning · Computer Science 2017-06-29 Lukas Balles , Javier Romero , Philipp Hennig

Adaptive Regularization via Residual Smoothing in Deep Learning Optimization

We present an adaptive regularization algorithm that can be effectively applied to the optimization problem in deep learning framework. Our regularization algorithm aims to take into account the fitness of data to the current state of model…

Machine Learning · Computer Science 2019-09-02 Junghee Cho , Junseok Kwon , Byung-Woo Hong

AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods

The choice of batch sizes in minibatch stochastic gradient optimizers is critical in large-scale model training for both optimization and generalization performance. Although large-batch training is arguably the dominant training paradigm…

Machine Learning · Computer Science 2024-05-29 Tim Tsz-Kit Lau , Han Liu , Mladen Kolar

A Guide to Stochastic Optimisation for Large-Scale Inverse Problems

Stochastic optimisation algorithms are the de facto standard for machine learning with large amounts of data. Handling only a subset of available data in each optimisation step dramatically reduces the per-iteration computational costs,…

Numerical Analysis · Mathematics 2024-12-19 Matthias J. Ehrhardt , Zeljko Kereta , Jingwei Liang , Junqi Tang

Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach

Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed -- larger batches are more…

Machine Learning · Computer Science 2024-10-15 Masaki Adachi , Satoshi Hayakawa , Martin Jørgensen , Xingchen Wan , Vu Nguyen , Harald Oberhauser , Michael A. Osborne

Stochastic Nonconvex Optimization with Large Minibatches

We study stochastic optimization of nonconvex loss functions, which are typical objectives for training neural networks. We propose stochastic approximation algorithms which optimize a series of regularized, nonlinearized losses on large…

Machine Learning · Computer Science 2019-03-12 Weiran Wang , Nathan Srebro

Variance Regularization for Accelerating Stochastic Optimization

While nowadays most gradient-based optimization methods focus on exploring the high-dimensional geometric features, the random error accumulated in a stochastic version of any algorithm implementation has not been stressed yet. In this…

Machine Learning · Computer Science 2020-08-14 Tong Yang , Long Sha , Pengyu Hong

Adaptive Sampling Strategies for Stochastic Optimization

In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the…

Optimization and Control · Mathematics 2017-11-01 Raghu Bollapragada , Richard Byrd , Jorge Nocedal

Dynamic Batch Adaptation

Current deep learning adaptive optimizer methods adjust the step magnitude of parameter updates by altering the effective learning rate used by each parameter. Motivated by the known inverse relation between batch size and learning rate on…

Machine Learning · Computer Science 2022-08-02 Cristian Simionescu , George Stoica , Robert Herscovici

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

The choice of batch-size in a stochastic optimization algorithm plays a substantial role for both optimization and generalization. Increasing the batch-size used typically improves optimization but degrades generalization. To address the…

Machine Learning · Computer Science 2020-03-03 Yeming Wen , Kevin Luk , Maxime Gazeau , Guodong Zhang , Harris Chan , Jimmy Ba

Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models

Stochastic optimization lies at the core of most statistical learning models. The recent great development of stochastic algorithmic tools focused significantly onto proximal gradient iterations, in order to find an efficient approach for…

Machine Learning · Computer Science 2020-03-31 Andrei Patrascu , Ciprian Paduraru , Paul Irofti

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Training deep neural networks with Stochastic Gradient Descent, or its variants, requires careful choice of both learning rate and batch size. While smaller batch sizes generally converge in fewer training epochs, larger batch sizes offer…

Machine Learning · Computer Science 2018-02-15 Aditya Devarakonda , Maxim Naumov , Michael Garland

Unlocking optimal batch size schedules using continuous-time control and perturbation theory

Stochastic Gradient Descent (SGD) and its variants are almost universally used to train neural networks and to fit a variety of other parametric models. An important hyperparameter in this context is the batch size, which determines how…

Optimization and Control · Mathematics 2023-12-05 Stefan Perko

Adaptive Learning of the Optimal Batch Size of SGD

Recent advances in the theoretical understanding of SGD led to a formula for the optimal batch size minimizing the number of effective data passes, i.e., the number of iterations times the batch size. However, this formula is of no…

Machine Learning · Computer Science 2021-11-22 Motasem Alfarra , Slavomir Hanzely , Alyazeed Albasyoni , Bernard Ghanem , Peter Richtarik

Gradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization

In this paper, we propose a stochastic search algorithm for solving general optimization problems with little structure. The algorithm iteratively finds high quality solutions by randomly sampling candidate solutions from a parameterized…

Optimization and Control · Mathematics 2013-01-08 Enlu Zhou , Jiaqiao Hu

A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization

We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning. Such algorithms have been proven useful in stochastic optimization by…

Machine Learning · Computer Science 2017-06-21 Vineet Gupta , Tomer Koren , Yoram Singer

Stochastic Learning Rate Optimization in the Stochastic Approximation and Online Learning Settings

In this work, multiplicative stochasticity is applied to the learning rate of stochastic optimization algorithms, giving rise to stochastic learning-rate schemes. In-expectation theoretical convergence results of Stochastic Gradient Descent…

Optimization and Control · Mathematics 2022-03-22 Theodoros Mamalis , Dusan Stipanovic , Petros Voulgaris

Adaptive Batch Size and Learning Rate Scheduler for Stochastic Gradient Descent Based on Minimization of Stochastic First-order Oracle Complexity

The convergence behavior of mini-batch stochastic gradient descent (SGD) is highly sensitive to the batch size and learning rate settings. Recent theoretical studies have identified the existence of a critical batch size that minimizes…

Machine Learning · Computer Science 2025-08-08 Hikaru Umeda , Hideaki Iiduka

Robust Sampling in Deep Learning

Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. The key idea is to modify the…

Machine Learning · Computer Science 2020-06-08 Aurora Cobo Aguilera , Antonio Artés-Rodríguez , Fernando Pérez-Cruz , Pablo Martínez Olmos

Adaptation and learning over networks under subspace constraints -- Part I: Stability Analysis

This paper considers optimization problems over networks where agents have individual objectives to meet, or individual parameter vectors to estimate, subject to subspace constraints that require the objectives across the network to lie in…

Multiagent Systems · Computer Science 2020-04-22 Roula Nassif , Stefan Vlaski , Ali H. Sayed