Related papers: Deep Learning Optimization Using Self-Adaptive Wei…
Variational Physics-Informed Neural Networks often suffer from poor convergence when using stochastic gradient-descent-based optimizers. By introducing a Least Squares solver for the weights of the last layer of the neural network, we…
One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…
In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to…
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. However, previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they…
Physics-informed deep learning has emerged as a promising alternative for solving partial differential equations. However, for complex problems, training these networks can still be challenging, often resulting in unsatisfactory accuracy…
The success of deep neural networks hinges on our ability to accurately and efficiently optimize high-dimensional, non-convex functions. In this paper, we empirically investigate the loss functions of state-of-the-art networks, and how…
We propose an efficient hybrid least squares/gradient descent method to accelerate DeepONet training. Since the output of DeepONet can be viewed as linear with respect to the last layer parameters of the branch network, these parameters can…
Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of…
Neural networks are typically optimized with variants of stochastic gradient descent. Under a squared loss, however, the optimal solution to the linear last layer weights is known in closed-form. We propose to leverage this during…
Physics-informed neural networks (PINNs) are extensively employed to solve partial differential equations (PDEs) by ensuring that the outputs and gradients of deep learning models adhere to the governing equations. However, constrained by…
Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations…
In this paper, we propose a new optimization framework, the layer separation (LySep) model, to improve the deep learning-based methods in solving partial differential equations. Due to the highly non-convex nature of the loss function in…
One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies…
Existing analyses of optimization in deep learning are either continuous, focusing on (variants of) gradient flow, or discrete, directly treating (variants of) gradient descent. Gradient flow is amenable to theoretical analysis, but is…
We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on…
The deep-learning-based least squares method has shown successful results in solving high-dimensional non-linear partial differential equations (PDEs). However, this method usually converges slowly. To speed up the convergence of this…
Machine learning methods are commonly used to solve inverse problems, wherein an unknown signal must be estimated from few indirect measurements generated via a known acquisition procedure. In particular, neural networks perform well…
Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in many domains, we are interested in performing well on metrics specific to the application. In this paper we propose a direct loss minimization…
One approach to deal with the statistical inefficiency of neural networks is to rely on auxiliary losses that help to build useful representations. However, it is not always trivial to know if an auxiliary task will be helpful for the main…
We prove linear convergence of gradient descent to a global optimum for the training of deep residual networks with constant layer width and smooth activation function. We show that if the trained weights, as a function of the layer index,…