English
Related papers

Related papers: Linear Range in Gradient Descent

200 papers

In this article, we show that solving the system of linear equations by manipulating the kernel and the range space is equivalent to solving the problem of least squares error approximation. This establishes the ground for a gradient-free…

Machine Learning · Computer Science 2018-10-30 Kar-Ann Toh , Zhiping Lin , Zhengguo Li , Beomseok Oh , Lei Sun

Linear networks provide valuable insights into the workings of neural networks in general. This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in…

Optimization and Control · Mathematics 2020-06-30 Armin Eftekhari

Nonparametric regression with random design is considered. Estimates are defined by minimzing a penalized empirical $L_2$ risk over a suitably chosen class of neural networks with one hidden layer via gradient descent. Here, the gradient…

Statistics Theory · Mathematics 2019-12-10 Alina Braun , Michael Kohler , Harro Walk

Learning rules -- prescriptions for updating model parameters to improve performance -- are typically assumed rather than derived. Why do some learning rules work better than others, and under what assumptions can a given rule be considered…

Machine Learning · Computer Science 2025-11-03 John J. Vastola , Samuel J. Gershman , Kanaka Rajan

Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods…

Disordered Systems and Neural Networks · Physics 2009-10-31 Magnus Rattray , David Saad

We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training. We propose a tensor formulation of neural networks that includes fully-connected, diagonal, and…

Machine Learning · Computer Science 2021-09-13 Chulhee Yun , Shankar Krishnan , Hossein Mobahi

We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Specifically, we propose to explore gradient-based features. These features are gradients of the…

Machine Learning · Computer Science 2020-04-14 Fangzhou Mu , Yingyu Liang , Yin Li

Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon…

Machine Learning · Computer Science 2022-05-17 Hancheng Min , Salma Tarmoun , Rene Vidal , Enrique Mallada

We theoretically characterize gradient descent dynamics in deep linear networks trained at large width from random initialization and on large quantities of random data. Our theory captures the ``wider is better" effect of…

Machine Learning · Computer Science 2025-06-17 Blake Bordelon , Cengiz Pehlevan

In this article, a novel approach to learning a complex function which can be written as the system of linear equations is introduced. This learning is grounded upon the observation that solving the system of linear equations by a…

Machine Learning · Computer Science 2018-10-23 Kar-Ann Toh

The optimization foundations of deep linear networks have recently received significant attention. However, due to their inherent non-convexity and hierarchical structure, analyzing the loss functions of deep linear networks remains a…

Optimization and Control · Mathematics 2025-09-24 Po Chen , Rujun Jiang , Peng Wang

We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the…

Machine Learning · Computer Science 2023-07-17 Harshad Khadilkar

Gradient descent, when applied to the task of logistic regression, outputs iterates which are biased to follow a unique ray defined by the data. The direction of this ray is the maximum margin predictor of a maximal linearly separable…

Machine Learning · Computer Science 2019-06-11 Ziwei Ji , Matus Telgarsky

Vanishing (and exploding) gradients effect is a common problem for recurrent neural networks with nonlinear activation functions which use backpropagation method for calculation of derivatives. Deep feedforward neural networks with many…

Neural and Evolutionary Computing · Computer Science 2017-02-15 Artem Chernodub , Dimitri Nowicki

We study the properties of alignment, a form of implicit regularization, in linear neural networks under gradient descent. We define alignment for fully connected networks with multidimensional outputs and show that it is a natural…

Machine Learning · Computer Science 2020-06-18 Adityanarayanan Radhakrishnan , Eshaan Nichani , Daniel Bernstein , Caroline Uhler

We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $x \mapsto W_N W_{N-1} \cdots W_1 x$) by minimizing the $\ell_2$ loss over whitened data. Convergence at a linear…

Machine Learning · Computer Science 2019-10-29 Sanjeev Arora , Nadav Cohen , Noah Golowich , Wei Hu

Deep learning models, such as wide neural networks, can be conceptualized as nonlinear dynamical physical systems characterized by a multitude of interacting degrees of freedom. Such systems in the infinite limit, tend to exhibit simplified…

Machine Learning · Computer Science 2024-01-09 Ori Shem-Ur , Yaron Oz

Quantifying extra functions, herein referred to as outcome functions, over optimal solutions of an optimization problem can provide decision makers with additional information on a system. This bears more importance when the optimization…

Optimization and Control · Mathematics 2020-12-17 Mohsen Mohammadi , Monica Gentili

Inspired by recent research that recommends starting neural networks training with large learning rates (LRs) to achieve the best generalization, we explore this hypothesis in detail. Our study clarifies the initial LR ranges that provide…

Machine Learning · Computer Science 2023-11-21 Ekaterina Lobacheva , Eduard Pockonechnyy , Maxim Kodryan , Dmitry Vetrov

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav
‹ Prev 1 2 3 10 Next ›