Related papers: Learning Unitaries by Gradient Descent

Gradient descent revisited via an adaptive online learning rate

Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the…

Machine Learning · Statistics 2018-04-10 Mathieu Ravaut , Satya Gorti

Convergence of gradient descent for learning linear neural networks

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on…

Machine Learning · Computer Science 2021-11-25 Gabin Maxime Nguegnang , Holger Rauhut , Ulrich Terstiege

Deep Learning Optimization Using Self-Adaptive Weighted Auxiliary Variables

In this paper, we develop a new optimization framework for the least squares learning problem via fully connected neural networks or physics-informed neural networks. The gradient descent sometimes behaves inefficiently in deep learning…

Machine Learning · Computer Science 2025-05-01 Yaru Liu , Yiqi Gu , Michael K. Ng

Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent

This paper presents a novel coordinate descent algorithm leveraging a combination of one-directional line search and gradient information for parameter updates for a squared error loss function. Each parameter undergoes updates determined…

Machine Learning · Computer Science 2024-08-05 Yen-Che Hsiao , Abhishek Dutta

How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks

The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works…

Machine Learning · Computer Science 2024-11-06 Mo Zhou , Rong Ge

Neural Networks can Learn Representations with Gradient Descent

Significant theoretical work has established that in specific regimes, neural networks trained by gradient descent behave like kernel methods. However, in practice, it is known that neural networks strongly outperform their associated…

Machine Learning · Computer Science 2022-07-01 Alex Damian , Jason D. Lee , Mahdi Soltanolkotabi

Decoupling Search and Learning in Neural Net Training

Gradient descent typically converges to a single minimum of the training loss without mechanisms to explore alternative minima that may generalize better. Searching for diverse minima directly in high-dimensional parameter space is…

Machine Learning · Computer Science 2025-09-16 Akshay Vegesna , Samip Dahal

The Dynamics of Gradient Descent for Overparametrized Neural Networks

We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve…

Machine Learning · Computer Science 2021-05-17 Siddhartha Satpathi , R Srikant

Convergence of gradient flow for learning convolutional neural networks

Convolutional neural networks are widely used in imaging and image recognition. Learning such networks from training data leads to the minimization of a non-convex function. This makes the analysis of standard optimization methods such as…

Optimization and Control · Mathematics 2026-01-14 Jona-Maria Diederen , Holger Rauhut , Ulrich Terstiege

Analysis of Natural Gradient Descent for Multilayer Neural Networks

Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods…

Disordered Systems and Neural Networks · Physics 2009-10-31 Magnus Rattray , David Saad

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

Transformers learn to implement preconditioned gradient descent for in-context learning

Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of…

Machine Learning · Computer Science 2023-11-13 Kwangjun Ahn , Xiang Cheng , Hadi Daneshmand , Suvrit Sra

An Improved Analysis of Training Over-parameterized Deep Neural Networks

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

Quantum Time-Series Learning with Evolutionary Algorithms

Variational quantum circuits have arisen as an important method in quantum computing. A crucial step of it is parameter optimization, which is typically tackled through gradient-descent techniques. We advantageously explore instead the use…

Quantum Physics · Physics 2024-12-24 Vignesh Anantharamakrishnan , Márcio M. Taddei

Descend or Rewind? Stochastic Gradient Descent Unlearning

Machine unlearning algorithms aim to remove the impact of selected training data from a model without the computational expenses of retraining from scratch. Two such algorithms are ``Descent-to-Delete" (D2D) and ``Rewind-to-Delete" (R2D),…

Machine Learning · Computer Science 2026-03-02 Siqiao Mu , Diego Klabjan

Learning Unstable Dynamical Systems with Time-Weighted Logarithmic Loss

When training the parameters of a linear dynamical model, the gradient descent algorithm is likely to fail to converge if the squared-error loss is used as the training loss function. Restricting the parameter space to a smaller subset and…

Machine Learning · Computer Science 2020-07-13 Kamil Nar , Yuan Xue , Andrew M. Dai

On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression

The problem of least squares regression of a $d$-dimensional unknown parameter is considered. A stochastic gradient descent based algorithm with weighted iterate-averaging that uses a single pass over the data is studied and its convergence…

Information Theory · Computer Science 2016-06-10 Kobi Cohen , Angelia Nedic , R. Srikant

Global Convergence and Geometric Characterization of Slow to Fast Weight Evolution in Neural Network Training for Classifying Linearly Non-Separable Data

In this paper, we study the dynamics of gradient descent in learning neural networks for classification problems. Unlike in existing works, we consider the linearly non-separable case where the training data of different classes lie in…

Machine Learning · Computer Science 2020-12-11 Ziang Long , Penghang Yin , Jack Xin

Differentiable Self-Adaptive Learning Rate

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

Learning Unitary Operators with Help From u(n)

A major challenge in the training of recurrent neural networks is the so-called vanishing or exploding gradient problem. The use of a norm-preserving transition operator can address this issue, but parametrization is challenging. In this…

Machine Learning · Statistics 2017-01-11 Stephanie L. Hyland , Gunnar Rätsch