Related papers: Closed-Form Last Layer Optimization

Optimizing Variational Physics-Informed Neural Networks Using Least Squares

Variational Physics-Informed Neural Networks often suffer from poor convergence when using stochastic gradient-descent-based optimizers. By introducing a Least Squares solver for the weights of the last layer of the neural network, we…

Numerical Analysis · Mathematics 2025-03-20 Carlos Uriarte , Manuela Bastidas , David Pardo , Jamie M. Taylor , Sergio Rojas

Deep Learning Optimization Using Self-Adaptive Weighted Auxiliary Variables

In this paper, we develop a new optimization framework for the least squares learning problem via fully connected neural networks or physics-informed neural networks. The gradient descent sometimes behaves inefficiently in deep learning…

Machine Learning · Computer Science 2025-05-01 Yaru Liu , Yiqi Gu , Michael K. Ng

A Closed-form Solution for Weight Optimization in Fully-connected Feed-forward Neural Networks

This work addresses weight optimization problem for fully-connected feed-forward neural networks. Unlike existing approaches that are based on back-propagation (BP) and chain rule gradient-based optimization (which implies iterative…

Machine Learning · Computer Science 2024-06-18 Slavisa Tomic , João Pedro Matos-Carvalho , Marko Beko

Neural networks with late-phase weights

The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the…

Machine Learning · Computer Science 2022-04-12 Johannes von Oswald , Seijin Kobayashi , Alexander Meulemans , Christian Henning , Benjamin F. Grewe , João Sacramento

Stochastic gradient with least-squares control variates

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by…

Optimization and Control · Mathematics 2025-11-21 Fabio Nobile , Matteo Raviola , Nathan Schaeffer

Masked Training of Neural Networks with Partial Gradients

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

Hybrid Least Squares/Gradient Descent Methods for DeepONets

We propose an efficient hybrid least squares/gradient descent method to accelerate DeepONet training. Since the output of DeepONet can be viewed as linear with respect to the last layer parameters of the branch network, these parameters can…

Machine Learning · Computer Science 2025-08-22 Jun Choi , Chang-Ock Lee , Minam Moon

Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks

Stochastic gradient descent (SGD) is the main approach for training deep networks: it moves towards the optimum of the cost function by iteratively updating the parameters of a model in the direction of the gradient of the loss evaluated on…

Machine Learning · Computer Science 2021-03-30 Loris Nanni , Gianluca Maguolo , Alessandra Lumini

AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the…

Machine Learning · Computer Science 2023-09-12 S. K. Roy , M. E. Paoletti , J. M. Haut , S. R. Dubey , P. Kar , A. Plaza , B. B. Chaudhuri

Lookahead Optimizer: k steps forward, 1 step back

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate…

Machine Learning · Computer Science 2019-12-04 Michael R. Zhang , James Lucas , Geoffrey Hinton , Jimmy Ba

The Dynamics of Gradient Descent for Overparametrized Neural Networks

We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve…

Machine Learning · Computer Science 2021-05-17 Siddhartha Satpathi , R Srikant

Non-Gradient Manifold Neural Network

Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent and thus has a slow convergence. In addition, softmax, as a decision layer, may ignore the distribution information of the data during…

Machine Learning · Computer Science 2021-06-16 Rui Zhang , Ziheng Jiao , Hongyuan Zhang , Xuelong Li

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

An empirical analysis of the optimization of deep network loss surfaces

The success of deep neural networks hinges on our ability to accurately and efficiently optimize high-dimensional, non-convex functions. In this paper, we empirically investigate the loss functions of state-of-the-art networks, and how…

Machine Learning · Computer Science 2017-12-11 Daniel Jiwoong Im , Michael Tao , Kristin Branson

Channel-Directed Gradients for Optimization of Convolutional Neural Networks

We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. The method requires only simple processing of existing stochastic…

Machine Learning · Computer Science 2020-08-26 Dong Lao , Peihao Zhu , Peter Wonka , Ganesh Sundaramoorthi

Variational Stochastic Gradient Descent for Deep Neural Networks

Current state-of-the-art optimizers are adaptive gradient-based optimization methods such as Adam. Recently, there has been an increasing interest in formulating gradient-based optimizers in a probabilistic framework for better modeling the…

Machine Learning · Computer Science 2025-04-21 Haotian Chen , Anna Kuzina , Babak Esmaeili , Jakub M Tomczak

Towards Guided Descent: Optimization Algorithms for Training Neural Networks At Scale

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Optimized convergence of stochastic gradient descent by weighted averaging

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the…

Optimization and Control · Mathematics 2022-10-06 Melinda Hagedorn , Florian Jarre

Structured and Fast Optimization: The Kronecker SGD Algorithm

Stochastic gradient descent (SGD) now acts as a fundamental part of optimization in current machine learning. Meanwhile, deep learning architectures have shown outstanding performance in a wide range of fields, such as natural language…

Machine Learning · Computer Science 2026-01-27 Zhao Song , Song Yue

Neural network optimization strategies and the topography of the loss landscape

Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perform well on the training data. A key issue in…

Machine Learning · Computer Science 2026-02-26 Jianneng Yu , Alexandre V. Morozov