Related papers: A Bootstrap Algorithm for Fast Supervised Learning

Masked Training of Neural Networks with Partial Gradients

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

Instant Learning: Parallel Deep Neural Networks and Convolutional Bootstrapping

Although deep neural networks (DNN) are able to scale with direct advances in computational power (e.g., memory and processing speed), they are not well suited to exploit the recent trends for parallel architectures. In particular, gradient…

Machine Learning · Computer Science 2016-05-24 Andrew J. R. Simpson

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Reinforced stochastic gradient descent for deep neural network learning

Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…

Machine Learning · Computer Science 2017-11-23 Haiping Huang , Taro Toyoizumi

Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks

Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2020-06-24 Jinghui Chen , Dongruo Zhou , Yiqi Tang , Ziyan Yang , Yuan Cao , Quanquan Gu

Stochastic Training of Neural Networks via Successive Convex Approximations

This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA)…

Machine Learning · Statistics 2017-06-16 Simone Scardapane , Paolo Di Lorenzo

On the Convergence of SGD Training of Neural Networks

Neural networks are usually trained by some form of stochastic gradient descent (SGD)). A number of strategies are in common use intended to improve SGD optimization, such as learning rate schedules, momentum, and batching. These are…

Neural and Evolutionary Computing · Computer Science 2015-08-13 Thomas M. Breuel

Improving SGD convergence by online linear regression of gradients in multiple statistically relevant directions

Deep neural networks are usually trained with stochastic gradient descent (SGD), which minimizes objective function using very rough approximations of gradient, only averaging to the real gradient. Standard approaches like momentum or ADAM…

Machine Learning · Computer Science 2023-03-14 Jarek Duda

Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks

Stochastic gradient descent (SGD) is the main approach for training deep networks: it moves towards the optimum of the cost function by iteratively updating the parameters of a model in the direction of the gradient of the loss evaluated on…

Machine Learning · Computer Science 2021-03-30 Loris Nanni , Gianluca Maguolo , Alessandra Lumini

A Dynamic Sampling Adaptive-SGD Method for Machine Learning

We propose a stochastic optimization method for minimizing loss functions, expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such…

Machine Learning · Computer Science 2020-03-04 Achraf Bahamou , Donald Goldfarb

Training Overparametrized Neural Networks in Sublinear Time

The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI). Despite…

Machine Learning · Computer Science 2024-02-09 Yichuan Deng , Hang Hu , Zhao Song , Omri Weinstein , Danyang Zhuo

Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs

We describe a computationally efficient, stochastic graph-regularization technique that can be utilized for the semi-supervised training of deep neural networks in a parallel or distributed setting. We utilize a technique, first described…

Machine Learning · Statistics 2018-05-31 Sunil Thulasidasan , Jeffrey Bilmes , Garrett Kenyon

Layer-Specific Adaptive Learning Rates for Deep Networks

The increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely…

Computer Vision and Pattern Recognition · Computer Science 2015-10-16 Bharat Singh , Soham De , Yangmuzi Zhang , Thomas Goldstein , Gavin Taylor

Learning Neural Networks by Neuron Pursuit

The first part of this paper studies the evolution of gradient flow for homogeneous neural networks near a class of saddle points exhibiting a sparsity structure. The choice of these saddle points is motivated from previous works on…

Machine Learning · Computer Science 2025-09-16 Akshay Kumar , Jarvis Haupt

Sequential Training of Neural Networks with Gradient Boosting

This paper presents a novel technique based on gradient boosting to train the final layers of a neural network (NN). Gradient boosting is an additive expansion algorithm in which a series of models are trained sequentially to approximate a…

Machine Learning · Computer Science 2023-05-05 Seyedsaman Emami , Gonzalo Martínez-Muñoz

OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training

The training of modern deep learning neural network calls for large amounts of computation, which is often provided by GPUs or other specific accelerators. To scale out to achieve faster training speed, two update algorithms are mainly…

Machine Learning · Computer Science 2020-05-15 Yemao Xu , Dezun Dong , Weixia Xu , Xiangke Liao

Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples

Self-paced learning and hard example mining re-weight training instances to improve learning accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD):…

Machine Learning · Statistics 2018-01-09 Haw-Shiuan Chang , Erik Learned-Miller , Andrew McCallum

Training Artificial Neural Networks by Coordinate Search Algorithm

Training Artificial Neural Networks poses a challenging and critical problem in machine learning. Despite the effectiveness of gradient-based learning methods, such as Stochastic Gradient Descent (SGD), in training neural networks, they do…

Machine Learning · Computer Science 2024-02-21 Ehsan Rokhsatyazdi , Shahryar Rahnamayan , Sevil Zanjani Miyandoab , Azam Asilian Bidgoli , H. R. Tizhoosh

Research of Damped Newton Stochastic Gradient Descent Method for Neural Network Training

First-order methods like stochastic gradient descent(SGD) are recently the popular optimization method to train deep neural networks (DNNs), but second-order methods are scarcely used because of the overpriced computing cost in getting the…

Machine Learning · Computer Science 2021-04-01 Jingcheng Zhou , Wei Wei , Zhiming Zheng

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep…

Machine Learning · Computer Science 2022-10-14 Mingrui Liu , Zhenxun Zhuang , Yunwei Lei , Chunyang Liao