English
Related papers

Related papers: Gradient Descent Quantizes ReLU Network Features

200 papers

Implicit deep learning has received increasing attention recently due to the fact that it generalizes the recursive prediction rules of many commonly used neural network architectures. Its prediction rule is provided implicitly based on the…

Machine Learning · Computer Science 2022-02-21 Tianxiang Gao , Hailiang Liu , Jia Liu , Hridesh Rajan , Hongyang Gao

Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very good generalization performance in the over-parameterization regime, where DNNs can easily fit a random labeling of the training data. Very…

Machine Learning · Computer Science 2019-11-28 Yuan Cao , Quanquan Gu

We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for…

Machine Learning · Computer Science 2018-12-31 Difan Zou , Yuan Cao , Dongruo Zhou , Quanquan Gu

We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD…

Machine Learning · Computer Science 2019-09-04 Lili Su , Pengkun Yang

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class…

Machine Learning · Computer Science 2019-08-02 Yuanzhi Li , Yingyu Liang

Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via SGD for…

Machine Learning · Computer Science 2022-05-02 Alexander Shevchenko , Vyacheslav Kungurtsev , Marco Mondelli

One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies…

Machine Learning · Computer Science 2019-02-06 Simon S. Du , Xiyu Zhai , Barnabas Poczos , Aarti Singh

While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1)…

Machine Learning · Computer Science 2019-08-27 Tomaso Poggio , Andrzej Banburski , Qianli Liao

Overparameterized ML models, including neural networks, typically induce underdetermined training objectives with multiple global minima. The implicit bias refers to the limiting global minimum that is attained by a common optimization…

Machine Learning · Statistics 2026-03-06 Kuo-Wei Lai , Guanghui Wang , Molei Tao , Vidya Muthukumar

We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve…

Machine Learning · Computer Science 2021-05-17 Siddhartha Satpathi , R Srikant

Deep neural networks (DNNs) have demonstrated dominating performance in many fields; since AlexNet, networks used in practice are going wider and deeper. On the theoretical side, a long line of works has been focusing on training neural…

Machine Learning · Computer Science 2019-06-18 Zeyuan Allen-Zhu , Yuanzhi Li , Zhao Song

Implicit neural networks have become increasingly attractive in the machine learning community since they can achieve competitive performance but use much less computational resources. Recently, a line of theoretical works established the…

Machine Learning · Computer Science 2022-10-03 Tianxiang Gao , Hongyang Gao

We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that,…

Machine Learning · Computer Science 2019-11-13 Yuan Cao , Quanquan Gu

Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well…

Machine Learning · Statistics 2021-09-28 Tianyang Hu , Wenjia Wang , Cong Lin , Guang Cheng

Overparameterization in deep learning typically refers to settings where a trained neural network (NN) has representational capacity to fit the training data in many ways, some of which generalize well, while others do not. In the case of…

Machine Learning · Computer Science 2023-03-24 Edo Cohen-Karlik , Itamar Menuhin-Gruman , Raja Giryes , Nadav Cohen , Amir Globerson

The generalization mystery of overparametrized deep nets has motivated efforts to understand how gradient descent (GD) converges to low-loss solutions that generalize well. Real-life neural networks are initialized from small random values…

Machine Learning · Computer Science 2021-11-10 Kaifeng Lyu , Zhiyuan Li , Runzhe Wang , Sanjeev Arora

Implicit deep learning has recently become popular in the machine learning community since these implicit models can achieve competitive performance with state-of-the-art deep networks while using significantly less memory and computational…

Machine Learning · Computer Science 2022-05-17 Tianxiang Gao , Hongyang Gao

Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional…

Machine Learning · Computer Science 2023-07-07 Liu Yang , Jifan Zhang , Joseph Shenouda , Dimitris Papailiopoulos , Kangwook Lee , Robert D. Nowak

We study the implicit bias towards low-rank weight matrices when training neural networks (NN) with Weight Decay (WD). We prove that when a ReLU NN is sufficiently trained with Stochastic Gradient Descent (SGD) and WD, its weight matrix is…

Machine Learning · Computer Science 2024-10-04 Ke Chen , Chugang Yi , Haizhao Yang

Several works have aimed to explain why overparameterized neural networks generalize well when trained by Stochastic Gradient Descent (SGD). The consensus explanation that has emerged credits the randomized nature of SGD for the bias of the…

Machine Learning · Computer Science 2021-02-24 Shengchao Liu , Dimitris Papailiopoulos , Dimitris Achlioptas
‹ Prev 1 2 3 10 Next ›