English
Related papers

Related papers: Automatic Gradient Descent: Deep Learning without …

200 papers

Descent methods for deep networks are notoriously capricious: they require careful tuning of step size, momentum and weight decay, and which method will work best on a new benchmark is a priori unclear. To address this problem, this paper…

Neural and Evolutionary Computing · Computer Science 2021-09-21 Yang Liu , Jeremy Bernstein , Markus Meister , Yisong Yue

Neural networks trained with standard objectives exhibit behaviors characteristic of probabilistic inference: soft clustering, prototype specialization, and Bayesian uncertainty tracking. These phenomena appear across architectures -- in…

Machine Learning · Computer Science 2026-01-01 Alan Oursland

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad…

A candidate explanation of the good empirical performance of deep neural networks is the implicit regularization effect of first order optimization methods. Inspired by this, we prove a convergence theorem for nonconvex composite…

Machine Learning · Computer Science 2023-02-14 Dávid Terjék , Diego González-Sánchez

We analyze recurrent neural networks with diagonal hidden-to-hidden weight matrices, trained with gradient descent in the supervised learning setting, and prove that gradient descent can achieve optimality \emph{without} massive…

Machine Learning · Computer Science 2024-10-11 Semih Cayci , Atilla Eryilmaz

Adaptive gradient methods such as Adam have gained increasing popularity in deep learning optimization. However, it has been observed that compared with (stochastic) gradient descent, Adam can converge to a different solution with a…

Machine Learning · Computer Science 2021-08-26 Difan Zou , Yuan Cao , Yuanzhi Li , Quanquan Gu

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

The training of machine learning models is typically carried out using some form of gradient descent, often with great success. However, non-asymptotic analyses of first-order optimization algorithms typically employ a gradient smoothness…

Machine Learning · Computer Science 2024-06-18 Thomas Flynn

A main puzzle of deep networks revolves around the absence of overfitting despite large overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics…

Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of…

Machine Learning · Computer Science 2012-09-18 Yoshua Bengio

How does the choice of optimization algorithm shape a model's ability to learn features? To address this question for steepest descent methods --including sign descent, which is closely related to Adam --we introduce steepest mirror flows…

Machine Learning · Computer Science 2026-03-03 Tom Jacobs , Chao Zhou , Rebekka Burkholz

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized…

Machine Learning · Computer Science 2019-05-30 Simon S. Du , Jason D. Lee , Haochuan Li , Liwei Wang , Xiyu Zhai

This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of…

Machine Learning · Computer Science 2020-10-30 Kwan Ho Ryan Chan , Yaodong Yu , Chong You , Haozhi Qi , John Wright , Yi Ma

We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global…

Optimization and Control · Mathematics 2025-01-13 David A. R. Robin , Kevin Scaman , Marc Lelarge

Depth separation results propose a possible theoretical explanation for the benefits of deep neural networks over shallower architectures, establishing that the former possess superior approximation capabilities. However, there are no known…

Machine Learning · Computer Science 2023-02-03 Itay Safran , Jason D. Lee

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model…

Machine Learning · Computer Science 2022-10-18 Kartik Chandra , Audrey Xie , Jonathan Ragan-Kelley , Erik Meijer

Optimisers are an essential component for training machine learning models, and their design influences learning speed and generalisation. Several studies have attempted to learn more effective gradient-descent optimisers via solving a…

Machine Learning · Computer Science 2022-03-08 Boyan Gao , Henry Gouk , Hae Beom Lee , Timothy M. Hospedales

In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing…

Machine Learning · Statistics 2021-11-24 Anna Kerekes , Anna Mészáros , Ferenc Huszár

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav
‹ Prev 1 2 3 10 Next ›