Related papers: Deep Learning Optimization Using Self-Adaptive Wei…

Optimizing Variational Physics-Informed Neural Networks Using Least Squares

Variational Physics-Informed Neural Networks often suffer from poor convergence when using stochastic gradient-descent-based optimizers. By introducing a Least Squares solver for the weights of the last layer of the neural network, we…

Numerical Analysis · Mathematics 2025-03-20 Carlos Uriarte , Manuela Bastidas , David Pardo , Jamie M. Taylor , Sergio Rojas

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

Gradient Descent based Optimization Algorithms for Deep Learning Models Training

In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…

Machine Learning · Computer Science 2019-03-12 Jiawei Zhang

Revisiting Recursive Least Squares for Training Deep Neural Networks

Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. However, previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they…

Machine Learning · Computer Science 2021-09-08 Chunyuan Zhang , Qi Song , Hui Zhou , Yigui Ou , Hongyao Deng , Laurence Tianruo Yang

Self-adaptive weights based on balanced residual decay rate for physics-informed neural networks and deep operator networks

Physics-informed deep learning has emerged as a promising alternative for solving partial differential equations. However, for complex problems, training these networks can still be challenging, often resulting in unsatisfactory accuracy…

Machine Learning · Computer Science 2025-09-18 Wenqian Chen , Amanda A. Howard , Panos Stinis

An empirical analysis of the optimization of deep network loss surfaces

The success of deep neural networks hinges on our ability to accurately and efficiently optimize high-dimensional, non-convex functions. In this paper, we empirically investigate the loss functions of state-of-the-art networks, and how…

Machine Learning · Computer Science 2017-12-11 Daniel Jiwoong Im , Michael Tao , Kristin Branson

Hybrid Least Squares/Gradient Descent Methods for DeepONets

We propose an efficient hybrid least squares/gradient descent method to accelerate DeepONet training. Since the output of DeepONet can be viewed as linear with respect to the last layer parameters of the branch network, these parameters can…

Machine Learning · Computer Science 2025-08-22 Jun Choi , Chang-Ock Lee , Minam Moon

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…

Machine Learning · Computer Science 2017-03-03 Caglar Gulcehre , Jose Sotelo , Marcin Moczulski , Yoshua Bengio

Closed-Form Last Layer Optimization

Neural networks are typically optimized with variants of stochastic gradient descent. Under a squared loss, however, the optimal solution to the linear last layer weights is known in closed-form. We propose to leverage this during…

Machine Learning · Computer Science 2026-05-11 Alexandre Galashov , Nathaël Da Costa , Liyuan Xu , Philipp Hennig , Arthur Gretton

Convolution-weighting method for the physics-informed neural network: A Primal-Dual Optimization Perspective

Physics-informed neural networks (PINNs) are extensively employed to solve partial differential equations (PDEs) by ensuring that the outputs and gradients of deep learning models adhere to the governing equations. However, constrained by…

Machine Learning · Computer Science 2025-07-21 Chenhao Si , Ming Yan

Approximation results for Gradient Descent trained Shallow Neural Networks in $1d$

Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations…

Machine Learning · Computer Science 2022-09-20 R. Gentile , G. Welper

Layer Separation Deep Learning Model with Auxiliary Variables for Partial Differential Equations

In this paper, we propose a new optimization framework, the layer separation (LySep) model, to improve the deep learning-based methods in solving partial differential equations. Due to the highly non-convex nature of the loss function in…

Machine Learning · Computer Science 2025-07-18 Yaru Liu , Yiqi Gu

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies…

Machine Learning · Computer Science 2019-02-06 Simon S. Du , Xiyu Zhai , Barnabas Poczos , Aarti Singh

Continuous vs. Discrete Optimization of Deep Neural Networks

Existing analyses of optimization in deep learning are either continuous, focusing on (variants of) gradient flow, or discrete, directly treating (variants of) gradient descent. Gradient flow is amenable to theoretical analysis, but is…

Machine Learning · Computer Science 2021-12-30 Omer Elkabetz , Nadav Cohen

Convergence of gradient descent for learning linear neural networks

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on…

Machine Learning · Computer Science 2021-11-25 Gabin Maxime Nguegnang , Holger Rauhut , Ulrich Terstiege

Active Learning Based Sampling for High-Dimensional Nonlinear Partial Differential Equations

The deep-learning-based least squares method has shown successful results in solving high-dimensional non-linear partial differential equations (PDEs). However, this method usually converges slowly. To speed up the convergence of this…

Numerical Analysis · Mathematics 2025-07-10 Wenhan Gao , Chunmei Wang

Solving Inverse Problems with Deep Linear Neural Networks: Global Convergence Guarantees for Gradient Descent with Weight Decay

Machine learning methods are commonly used to solve inverse problems, wherein an unknown signal must be estimated from few indirect measurements generated via a known acquisition procedure. In particular, neural networks perform well…

Machine Learning · Computer Science 2025-12-05 Hannah Laus , Suzanna Parkinson , Vasileios Charisopoulos , Felix Krahmer , Rebecca Willett

Training Deep Neural Networks via Direct Loss Minimization

Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in many domains, we are interested in performing well on metrics specific to the application. In this paper we propose a direct loss minimization…

Machine Learning · Computer Science 2016-06-03 Yang Song , Alexander G. Schwing , Richard S. Zemel , Raquel Urtasun

Adapting Auxiliary Losses Using Gradient Similarity

One approach to deal with the statistical inefficiency of neural networks is to rely on auxiliary losses that help to build useful representations. However, it is not always trivial to know if an auxiliary task will be helpful for the main…

Machine Learning · Statistics 2020-11-30 Yunshu Du , Wojciech M. Czarnecki , Siddhant M. Jayakumar , Mehrdad Farajtabar , Razvan Pascanu , Balaji Lakshminarayanan

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

We prove linear convergence of gradient descent to a global optimum for the training of deep residual networks with constant layer width and smooth activation function. We show that if the trained weights, as a function of the layer index,…

Machine Learning · Computer Science 2023-01-26 Rama Cont , Alain Rossier , RenYuan Xu