English
Related papers

Related papers: Closed-Form Last Layer Optimization

200 papers

Variational Physics-Informed Neural Networks often suffer from poor convergence when using stochastic gradient-descent-based optimizers. By introducing a Least Squares solver for the weights of the last layer of the neural network, we…

Numerical Analysis · Mathematics 2025-03-20 Carlos Uriarte , Manuela Bastidas , David Pardo , Jamie M. Taylor , Sergio Rojas

In this paper, we develop a new optimization framework for the least squares learning problem via fully connected neural networks or physics-informed neural networks. The gradient descent sometimes behaves inefficiently in deep learning…

Machine Learning · Computer Science 2025-05-01 Yaru Liu , Yiqi Gu , Michael K. Ng

This work addresses weight optimization problem for fully-connected feed-forward neural networks. Unlike existing approaches that are based on back-propagation (BP) and chain rule gradient-based optimization (which implies iterative…

Machine Learning · Computer Science 2024-06-18 Slavisa Tomic , João Pedro Matos-Carvalho , Marko Beko

The largely successful method of training neural networks is to learn their weights using some variant of stochastic gradient descent (SGD). Here, we show that the solutions found by SGD can be further improved by ensembling a subset of the…

The stochastic gradient descent (SGD) method is a widely used approach for solving stochastic optimization problems, but its convergence is typically slow. Existing variance reduction techniques, such as SAGA, improve convergence by…

Optimization and Control · Mathematics 2025-11-21 Fabio Nobile , Matteo Raviola , Nathan Schaeffer

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD…

Machine Learning · Computer Science 2022-03-23 Amirkeivan Mohtashami , Martin Jaggi , Sebastian U. Stich

We propose an efficient hybrid least squares/gradient descent method to accelerate DeepONet training. Since the output of DeepONet can be viewed as linear with respect to the last layer parameters of the branch network, these parameters can…

Machine Learning · Computer Science 2025-08-22 Jun Choi , Chang-Ock Lee , Minam Moon

Stochastic gradient descent (SGD) is the main approach for training deep networks: it moves towards the optimum of the cost function by iteratively updating the parameters of a model in the direction of the gradient of the loss evaluated on…

Machine Learning · Computer Science 2021-03-30 Loris Nanni , Gianluca Maguolo , Alessandra Lumini

Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the…

Machine Learning · Computer Science 2023-09-12 S. K. Roy , M. E. Paoletti , J. M. Haut , S. R. Dubey , P. Kar , A. Plaza , B. B. Chaudhuri

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate…

Machine Learning · Computer Science 2019-12-04 Michael R. Zhang , James Lucas , Geoffrey Hinton , Jimmy Ba

We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve…

Machine Learning · Computer Science 2021-05-17 Siddhartha Satpathi , R Srikant

Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent and thus has a slow convergence. In addition, softmax, as a decision layer, may ignore the distribution information of the data during…

Machine Learning · Computer Science 2021-06-16 Rui Zhang , Ziheng Jiao , Hongyuan Zhang , Xuelong Li

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

The success of deep neural networks hinges on our ability to accurately and efficiently optimize high-dimensional, non-convex functions. In this paper, we empirically investigate the loss functions of state-of-the-art networks, and how…

Machine Learning · Computer Science 2017-12-11 Daniel Jiwoong Im , Michael Tao , Kristin Branson

We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. The method requires only simple processing of existing stochastic…

Machine Learning · Computer Science 2020-08-26 Dong Lao , Peihao Zhu , Peter Wonka , Ganesh Sundaramoorthi

Current state-of-the-art optimizers are adaptive gradient-based optimization methods such as Adam. Recently, there has been an increasing interest in formulating gradient-based optimizers in a probabilistic framework for better modeling the…

Machine Learning · Computer Science 2025-04-21 Haotian Chen , Anna Kuzina , Babak Esmaeili , Jakub M Tomczak

Neural network optimization remains one of the most consequential yet poorly understood challenges in modern AI research, where improvements in training algorithms can lead to enhanced feature learning in foundation models,…

Machine Learning · Computer Science 2025-12-23 Ansh Nagwekar

Under mild assumptions stochastic gradient methods asymptotically achieve an optimal rate of convergence if the arithmetic mean of all iterates is returned as an approximate optimal solution. However, in the absence of stochastic noise, the…

Optimization and Control · Mathematics 2022-10-06 Melinda Hagedorn , Florian Jarre

Stochastic gradient descent (SGD) now acts as a fundamental part of optimization in current machine learning. Meanwhile, deep learning architectures have shown outstanding performance in a wide range of fields, such as natural language…

Machine Learning · Computer Science 2026-01-27 Zhao Song , Song Yue

Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perform well on the training data. A key issue in…

Machine Learning · Computer Science 2026-02-26 Jianneng Yu , Alexandre V. Morozov
‹ Prev 1 2 3 10 Next ›