Related papers: Linear Range in Gradient Descent

Gradient-Free Learning Based on the Kernel and the Range Space

In this article, we show that solving the system of linear equations by manipulating the kernel and the range space is equivalent to solving the problem of least squares error approximation. This establishes the ground for a gradient-free…

Machine Learning · Computer Science 2018-10-30 Kar-Ann Toh , Zhiping Lin , Zhengguo Li , Beomseok Oh , Lei Sun

Training Linear Neural Networks: Non-Local Convergence and Complexity Results

Linear networks provide valuable insights into the workings of neural networks in general. This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in…

Optimization and Control · Mathematics 2020-06-30 Armin Eftekhari

On the rate of convergence of a neural network regression estimate learned by gradient descent

Nonparametric regression with random design is considered. Estimates are defined by minimzing a penalized empirical $L_2$ risk over a suitably chosen class of neural networks with one hidden layer via gradient descent. Here, the gradient…

Statistics Theory · Mathematics 2019-12-10 Alina Braun , Michael Kohler , Harro Walk

Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules

Learning rules -- prescriptions for updating model parameters to improve performance -- are typically assumed rather than derived. Why do some learning rules work better than others, and under what assumptions can a given rule be considered…

Machine Learning · Computer Science 2025-11-03 John J. Vastola , Samuel J. Gershman , Kanaka Rajan

Analysis of Natural Gradient Descent for Multilayer Neural Networks

Natural gradient descent is a principled method for adapting the parameters of a statistical model on-line using an underlying Riemannian parameter space to redefine the direction of steepest descent. The algorithm is examined via methods…

Disordered Systems and Neural Networks · Physics 2009-10-31 Magnus Rattray , David Saad

A Unifying View on Implicit Bias in Training Linear Neural Networks

We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training. We propose a tensor formulation of neural networks that includes fully-connected, diagonal, and…

Machine Learning · Computer Science 2021-09-13 Chulhee Yun , Shankar Krishnan , Hossein Mobahi

Gradients as Features for Deep Representation Learning

We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Specifically, we propose to explore gradient-based features. These features are gradients of the…

Machine Learning · Computer Science 2020-04-14 Fangzhou Mu , Yingyu Liang , Yin Li

Convergence and Implicit Bias of Gradient Flow on Overparametrized Linear Networks

Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon…

Machine Learning · Computer Science 2022-05-17 Hancheng Min , Salma Tarmoun , Rene Vidal , Enrique Mallada

Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer

We theoretically characterize gradient descent dynamics in deep linear networks trained at large width from random initialization and on large quantities of random data. Our theory captures the ``wider is better" effect of…

Machine Learning · Computer Science 2025-06-17 Blake Bordelon , Cengiz Pehlevan

Learning from the Kernel and the Range Space

In this article, a novel approach to learning a complex function which can be written as the system of linear equations is introduced. This learning is grounded upon the observation that solving the system of linear equations by a…

Machine Learning · Computer Science 2018-10-23 Kar-Ann Toh

Error Bound Analysis for the Regularized Loss of Deep Linear Neural Networks

The optimization foundations of deep linear networks have recently received significant attention. However, due to their inherent non-convexity and hierarchical structure, analyzing the loss functions of deep linear networks remains a…

Optimization and Control · Mathematics 2025-09-24 Po Chen , Rujun Jiang , Peng Wang

Using Linear Regression for Iteratively Training Neural Networks

We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the…

Machine Learning · Computer Science 2023-07-17 Harshad Khadilkar

Risk and parameter convergence of logistic regression

Gradient descent, when applied to the task of logistic regression, outputs iterates which are biased to follow a unique ray defined by the data. The direction of this ray is the maximum margin predictor of a maximal linearly separable…

Machine Learning · Computer Science 2019-06-11 Ziwei Ji , Matus Telgarsky

Sampling-based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks

Vanishing (and exploding) gradients effect is a common problem for recurrent neural networks with nonlinear activation functions which use backpropagation method for calculation of derivatives. Deep feedforward neural networks with many…

Neural and Evolutionary Computing · Computer Science 2017-02-15 Artem Chernodub , Dimitri Nowicki

On Alignment in Deep Linear Neural Networks

We study the properties of alignment, a form of implicit regularization, in linear neural networks under gradient descent. We define alignment for fully connected networks with multidimensional outputs and show that it is a natural…

Machine Learning · Computer Science 2020-06-18 Adityanarayanan Radhakrishnan , Eshaan Nichani , Daniel Bernstein , Caroline Uhler

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $x \mapsto W_N W_{N-1} \cdots W_1 x$) by minimizing the $\ell_2$ loss over whitened data. Convergence at a linear…

Machine Learning · Computer Science 2019-10-29 Sanjeev Arora , Nadav Cohen , Noah Golowich , Wei Hu

Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems

Deep learning models, such as wide neural networks, can be conceptualized as nonlinear dynamical physical systems characterized by a multitude of interacting degrees of freedom. Such systems in the infinite limit, tend to exhibit simplified…

Machine Learning · Computer Science 2024-01-09 Ori Shem-Ur , Yaron Oz

The Outcome Range Problem in Interval Linear Programming

Quantifying extra functions, herein referred to as outcome functions, over optimal solutions of an optimization problem can provide decision makers with additional information on a system. This bears more importance when the optimization…

Optimization and Control · Mathematics 2020-12-17 Mohsen Mohammadi , Monica Gentili

Large Learning Rates Improve Generalization: But How Large Are We Talking About?

Inspired by recent research that recommends starting neural networks training with large learning rates (LRs) to achieve the best generalization, we explore this hypothesis in detail. Our study clarifies the initial LR ranges that provide…

Machine Learning · Computer Science 2023-11-21 Ekaterina Lobacheva , Eduard Pockonechnyy , Maxim Kodryan , Dmitry Vetrov

A Comprehensive Study on Optimization Strategies for Gradient Descent In Deep Learning

One of the most important parts of Artificial Neural Networks is minimizing the loss functions which tells us how good or bad our model is. To minimize these losses we need to tune the weights and biases. Also to calculate the minimum value…

Machine Learning · Computer Science 2021-01-08 Kaustubh Yadav