English
Related papers

Related papers: Learning Unitaries by Gradient Descent

200 papers

Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the…

Machine Learning · Statistics 2018-04-10 Mathieu Ravaut , Satya Gorti

We study the convergence properties of gradient descent for training deep linear neural networks, i.e., deep matrix factorizations, by extending a previous analysis for the related gradient flow. We show that under suitable conditions on…

Machine Learning · Computer Science 2021-11-25 Gabin Maxime Nguegnang , Holger Rauhut , Ulrich Terstiege

In this paper, we develop a new optimization framework for the least squares learning problem via fully connected neural networks or physics-informed neural networks. The gradient descent sometimes behaves inefficiently in deep learning…

Machine Learning · Computer Science 2025-05-01 Yaru Liu , Yiqi Gu , Michael K. Ng

This paper presents a novel coordinate descent algorithm leveraging a combination of one-directional line search and gradient information for parameter updates for a squared error loss function. Each parameter undergoes updates determined…

Machine Learning · Computer Science 2024-08-05 Yen-Che Hsiao , Abhishek Dutta

The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works…

Machine Learning · Computer Science 2024-11-06 Mo Zhou , Rong Ge

Significant theoretical work has established that in specific regimes, neural networks trained by gradient descent behave like kernel methods. However, in practice, it is known that neural networks strongly outperform their associated…

Machine Learning · Computer Science 2022-07-01 Alex Damian , Jason D. Lee , Mahdi Soltanolkotabi

Gradient descent typically converges to a single minimum of the training loss without mechanisms to explore alternative minima that may generalize better. Searching for diverse minima directly in high-dimensional parameter space is…

Machine Learning · Computer Science 2025-09-16 Akshay Vegesna , Samip Dahal

We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve…

Machine Learning · Computer Science 2021-05-17 Siddhartha Satpathi , R Srikant

Convolutional neural networks are widely used in imaging and image recognition. Learning such networks from training data leads to the minimization of a non-convex function. This makes the analysis of standard optimization methods such as…

Optimization and Control · Mathematics 2026-01-14 Jona-Maria Diederen , Holger Rauhut , Ulrich Terstiege

Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods…

Disordered Systems and Neural Networks · Physics 2009-10-31 Magnus Rattray , David Saad

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav

Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of…

Machine Learning · Computer Science 2023-11-13 Kwangjun Ahn , Xiang Cheng , Hadi Daneshmand , Suvrit Sra

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

Variational quantum circuits have arisen as an important method in quantum computing. A crucial step of it is parameter optimization, which is typically tackled through gradient-descent techniques. We advantageously explore instead the use…

Quantum Physics · Physics 2024-12-24 Vignesh Anantharamakrishnan , Márcio M. Taddei

Machine unlearning algorithms aim to remove the impact of selected training data from a model without the computational expenses of retraining from scratch. Two such algorithms are ``Descent-to-Delete" (D2D) and ``Rewind-to-Delete" (R2D),…

Machine Learning · Computer Science 2026-03-02 Siqiao Mu , Diego Klabjan

When training the parameters of a linear dynamical model, the gradient descent algorithm is likely to fail to converge if the squared-error loss is used as the training loss function. Restricting the parameter space to a smaller subset and…

Machine Learning · Computer Science 2020-07-13 Kamil Nar , Yuan Xue , Andrew M. Dai

The problem of least squares regression of a $d$-dimensional unknown parameter is considered. A stochastic gradient descent based algorithm with weighted iterate-averaging that uses a single pass over the data is studied and its convergence…

Information Theory · Computer Science 2016-06-10 Kobi Cohen , Angelia Nedic , R. Srikant

In this paper, we study the dynamics of gradient descent in learning neural networks for classification problems. Unlike in existing works, we consider the linearly non-separable case where the training data of different classes lie in…

Machine Learning · Computer Science 2020-12-11 Ziang Long , Penghang Yin , Jack Xin

Learning rate adaptation is a popular topic in machine learning. Gradient Descent trains neural nerwork with a fixed learning rate. Learning rate adaptation is proposed to accelerate the training process through adjusting the step size in…

Machine Learning · Computer Science 2022-10-20 Bozhou Chen , Hongzhi Wang , Chenmin Ba

A major challenge in the training of recurrent neural networks is the so-called vanishing or exploding gradient problem. The use of a norm-preserving transition operator can address this issue, but parametrization is challenging. In this…

Machine Learning · Statistics 2017-01-11 Stephanie L. Hyland , Gunnar Rätsch
‹ Prev 1 2 3 10 Next ›