English
Related papers

Related papers: Predicting Training Time Without Training

200 papers

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many…

Machine Learning · Computer Science 2025-03-07 Zhiyan Ding , Shi Chen , Qin Li , Stephen Wright

Dynamic DNN optimization techniques such as layer-skipping offer increased adaptability and efficiency gains but can lead to i) a larger memory footprint as in decision gates, ii) increased training complexity (e.g., with non-differentiable…

Machine Learning · Computer Science 2025-05-26 Guilherme Korol , Antonio Carlos Schneider Beck , Jeronimo Castrillon

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any…

Machine Learning · Statistics 2013-02-19 Tom Schaul , Sixin Zhang , Yann LeCun

Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that iteratively improve the model parameters by estimating a gradient on a very small fraction of the training data. A major roadblock faced when…

Machine Learning · Computer Science 2020-06-11 Tao Lin , Lingjing Kong , Sebastian U. Stich , Martin Jaggi

Deep learning methods achieve state-of-the-art performance in many application scenarios. Yet, these methods require a significant amount of hyperparameters tuning in order to achieve the best results. In particular, tuning the learning…

Machine Learning · Computer Science 2017-11-07 Francesco Orabona , Tatiana Tommasi

Neural network pruning is useful for discovering efficient, high-performing subnetworks within pre-trained, dense network architectures. More often than not, it involves a three-step process -- pre-training, pruning, and re-training -- that…

Machine Learning · Statistics 2023-08-24 Cameron R. Wolfe , Fangshuo Liao , Qihan Wang , Junhyung Lyle Kim , Anastasios Kyrillidis

Training deep neural network is a high dimensional and a highly non-convex optimization problem. Stochastic gradient descent (SGD) algorithm and it's variations are the current state-of-the-art solvers for this task. However, due to…

Machine Learning · Computer Science 2017-01-17 Xi He , Dheevatsa Mudigere , Mikhail Smelyanskiy , Martin Takáč

Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly…

Machine Learning · Computer Science 2020-11-11 Frithjof Gressmann , Zach Eaton-Rosen , Carlo Luschi

We study the convergence of gradient descent (GD) and stochastic gradient descent (SGD) for training $L$-hidden-layer linear residual networks (ResNets). We prove that for training deep residual networks with certain linear transformations…

Machine Learning · Computer Science 2020-03-03 Difan Zou , Philip M. Long , Quanquan Gu

The choice of step-size used in Stochastic Gradient Descent (SGD) optimization is empirically selected in most training procedures. Moreover, the use of scheduled learning techniques such as Step-Decaying, Cyclical-Learning, and Warmup to…

Machine Learning · Computer Science 2020-06-12 Mahdi S. Hosseini , Konstantinos N. Plataniotis

An important class of problems involves training deep neural networks with sparse prediction targets of very high dimension D. These occur naturally in e.g. neural language models or the learning of word-embeddings, often posed as…

Neural and Evolutionary Computing · Computer Science 2015-07-15 Pascal Vincent , Alexandre de Brébisson , Xavier Bouthillier

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

The learning rate is perhaps the single most important parameter in the training of neural networks and, more broadly, in stochastic (nonconvex) optimization. Accordingly, there are numerous effective, but poorly understood, techniques for…

Machine Learning · Computer Science 2020-04-16 Bin Shi , Weijie J. Su , Michael I. Jordan

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based…

Machine Learning · Statistics 2025-03-19 Logan Engstrom , Andrew Ilyas , Benjamin Chen , Axel Feldmann , William Moses , Aleksander Madry

Modern deep models are often pretrained on large-scale data with missing labels using composite objectives, where the relative weights of multiple loss terms act as hyperparameters. Tuning these weights with random search or Bayesian…

Machine Learning · Computer Science 2026-05-11 Ivan Karpukhin , Andrey Savchenko

We propose an algorithm for the adaptation of the learning rate for stochastic gradient descent (SGD) that avoids the need for validation set use. The idea for the adaptiveness comes from the technique of extrapolation: to get an estimate…

Machine Learning · Statistics 2020-08-28 Antti Koskela , Antti Honkela

Continuous-time models provide important insights into the training dynamics of optimization algorithms in deep learning. In this work, we establish a non-asymptotic convergence analysis of stochastic gradient Langevin dynamics (SGLD),…

Machine Learning · Computer Science 2026-01-30 Noah Oberweis , Semih Cayci

Stochastic Gradient Descent (SGD) has played a central role in machine learning. However, it requires a carefully hand-picked stepsize for fast convergence, which is notoriously tedious and time-consuming to tune. Over the last several…

Machine Learning · Computer Science 2019-06-10 Zhenxun Zhuang , Ashok Cutkosky , Francesco Orabona

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar
‹ Prev 1 2 3 10 Next ›