Related papers: Mini-batch Gradient Descent with Buffer

Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator

We study here a fixed mini-batch gradient decent (FMGD) algorithm to solve optimization problems with massive datasets. In FMGD, the whole sample is split into multiple non-overlapping partitions. Once the partitions are formed, they are…

Computation · Statistics 2023-04-17 Haobo Qi , Feifei Wang , Hansheng Wang

Multiplexed gradient descent: Fast online training of modern datasets on hardware neural networks without backpropagation

We present multiplexed gradient descent (MGD), a gradient descent framework designed to easily train analog or digital neural networks in hardware. MGD utilizes zero-order optimization techniques for online training of hardware neural…

Machine Learning · Computer Science 2023-08-17 Adam N. McCaughan , Bakhrom G. Oripov , Natesh Ganesh , Sae Woo Nam , Andrew Dienstfrey , Sonia M. Buckley

MBGDT:Robust Mini-Batch Gradient Descent

In high dimensions, most machine learning method perform fragile even there are a little outliers. To address this, we hope to introduce a new method with the base learner, such as Bayesian regression or stochastic gradient descent to solve…

Machine Learning · Computer Science 2022-06-16 Hanming Wang , Haozheng Luo , Yue Wang

Scaling of hardware-compatible perturbative training algorithms

In this work, we explore the capabilities of multiplexed gradient descent (MGD), a scalable and efficient perturbative zeroth-order training method for estimating the gradient of a loss function in hardware and training it via stochastic…

Machine Learning · Computer Science 2025-05-01 Bakhrom G. Oripov , Andrew Dienstfrey , Adam N. McCaughan , Sonia M. Buckley

The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent

The mini-batch stochastic gradient descent (SGD) algorithm is widely used in training machine learning models, in particular deep learning models. We study SGD dynamics under linear regression and two-layer linear networks, with an easy…

Optimization and Control · Mathematics 2020-04-29 Xin Qian , Diego Klabjan

Faster SGD training by minibatch persistency

It is well known that, for most datasets, the use of large-size minibatches for Stochastic Gradient Descent (SGD) typically leads to slow convergence and poor generalization. On the other hand, large minibatches are of great practical…

Machine Learning · Computer Science 2018-06-20 Matteo Fischetti , Iacopo Mandatelli , Domenico Salvagnin

Optimizing ML Training with Metagradient Descent

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based…

Machine Learning · Statistics 2025-03-19 Logan Engstrom , Andrew Ilyas , Benjamin Chen , Axel Feldmann , William Moses , Aleksander Madry

mS2GD: Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

We propose a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent applied to the problem of minimizing a strongly convex composite function represented as the sum of an…

Machine Learning · Computer Science 2014-10-20 Jakub Konečný , Jie Liu , Peter Richtárik , Martin Takáč

Improving the convergence of SGD through adaptive batch sizes

Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function's gradient with a small number of training examples, aka the batch size. Small batch sizes require little computation for each model update…

Machine Learning · Computer Science 2023-09-28 Scott Sievert , Shrey Shah

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function…

Machine Learning · Computer Science 2016-04-20 Jakub Konečný , Jie Liu , Peter Richtárik , Martin Takáč

ABS-SGD: A Delayed Synchronous Stochastic Gradient Descent Algorithm with Adaptive Batch Size for Heterogeneous GPU Clusters

As the size of models and datasets grows, it has become increasingly common to train models in parallel. However, existing distributed stochastic gradient descent (SGD) algorithms suffer from insufficient utilization of computational…

Machine Learning · Computer Science 2023-08-30 Xin Zhou , Ling Chen , Houming Wu

Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training

Stochastic gradient descent~(SGD) and its variants have been the dominating optimization methods in machine learning. Compared to SGD with small-batch training, SGD with large-batch training can better utilize the computational power of…

Machine Learning · Statistics 2024-04-16 Shen-Yi Zhao , Chang-Wei Shi , Yin-Peng Xie , Wu-Jun Li

DIVEBATCH: Accelerating Model Training Through Gradient-Diversity Aware Batch Size Adaptation

The goal of this paper is to accelerate the training of machine learning models, a critical challenge since the training of large-scale deep neural models can be computationally expensive. Stochastic gradient descent (SGD) and its variants…

Machine Learning · Computer Science 2025-09-22 Yuen Chen , Yian Wang , Hari Sundaram

Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks

Quantized deep neural networks (QDNNs) are attractive due to their much lower memory storage and faster inference speed than their regular full precision counterparts. To maintain the same performance level especially at low bit-widths,…

Machine Learning · Computer Science 2019-01-08 Penghang Yin , Shuai Zhang , Jiancheng Lyu , Stanley Osher , Yingyong Qi , Jack Xin

Patch Gradient Descent: Training Neural Networks on Very Large Images

Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an…

Computer Vision and Pattern Recognition · Computer Science 2023-02-01 Deepak K. Gupta , Gowreesh Mago , Arnav Chavan , Dilip K. Prasad

Stochastic Gradient Descent on Highly-Parallel Architectures

There is an increased interest in building data analytics frameworks with advanced algebraic capabilities both in industry and academia. Many of these frameworks, e.g., TensorFlow and BIDMach, implement their compute-intensive primitives in…

Databases · Computer Science 2018-02-27 Yujing Ma , Florin Rusu , Martin Torres

Amortized Analysis on Asynchronous Gradient Descent

Gradient descent is an important class of iterative algorithms for minimizing convex functions. Classically, gradient descent has been a sequential and synchronous process. Distributed and asynchronous variants of gradient descent have been…

Optimization and Control · Mathematics 2014-12-02 Yun Kuen Cheung , Richard Cole

Block Acceleration Without Momentum: On Optimal Stepsizes of Block Gradient Descent for Least-Squares

Block coordinate descent is a powerful algorithmic template suitable for big data optimization. This template admits a lot of variants including block gradient descent (BGD), which performs gradient descent on a selected block of variables,…

Optimization and Control · Mathematics 2024-05-28 Liangzu Peng , Wotao Yin

A block-random algorithm for learning on distributed, heterogeneous data

Most deep learning models are based on deep neural networks with multiple layers between input and output. The parameters defining these layers are initialized using random values and are "learned" from data, typically using stochastic…

Machine Learning · Computer Science 2019-03-05 Prakash Mohan , Marc T. Henry de Frahan , Ryan King , Ray W. Grout

Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent

The plain stochastic gradient descent and momentum stochastic gradient descent have extremely wide applications in deep learning due to their simple settings and low computational complexity. The momentum stochastic gradient descent uses…

Machine Learning · Computer Science 2021-06-15 Kun Zeng , Jinlan Liu , Zhixia Jiang , Dongpo Xu