Related papers: Learning ReLU Networks on Linearly Separable Data:…

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for…

Machine Learning · Computer Science 2018-12-31 Difan Zou , Yuan Cao , Dongruo Zhou , Quanquan Gu

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

In recent years, stochastic gradient descent (SGD) based techniques has become the standard tools for training neural networks. However, formal theoretical understanding of why SGD can train neural networks in practice is largely missing.…

Machine Learning · Computer Science 2017-11-03 Yuanzhi Li , Yang Yuan

SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data

Neural networks exhibit good generalization behavior in the over-parameterized regime, where the number of network parameters exceeds the number of observations. Nonetheless, current generalization bounds for neural networks fail to explain…

Machine Learning · Computer Science 2017-10-30 Alon Brutzkus , Amir Globerson , Eran Malach , Shai Shalev-Shwartz

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class…

Machine Learning · Computer Science 2019-08-02 Yuanzhi Li , Yingyu Liang

Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks

Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via SGD for…

Machine Learning · Computer Science 2022-05-02 Alexander Shevchenko , Vyacheslav Kungurtsev , Marco Mondelli

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD…

Numerical Analysis · Mathematics 2022-09-28 Arnulf Jentzen , Adrian Riekert

A global convergence theory for deep ReLU implicit networks via over-parameterization

Implicit deep learning has received increasing attention recently due to the fact that it generalizes the recursive prediction rules of many commonly used neural network architectures. Its prediction rule is provided implicitly based on the…

Machine Learning · Computer Science 2022-02-21 Tianxiang Gao , Hailiang Liu , Jia Liu , Hridesh Rajan , Hongyang Gao

Learning One-hidden-layer ReLU Networks via Gradient Descent

We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher…

Machine Learning · Statistics 2018-06-21 Xiao Zhang , Yaodong Yu , Lingxiao Wang , Quanquan Gu

On the Convergence Rate of Training Recurrent Neural Networks

How can local-search methods such as stochastic gradient descent (SGD) avoid bad local minima in training multi-layer neural networks? Why can they fit random labels even given non-convex and non-smooth architectures? Most existing theory…

Machine Learning · Computer Science 2019-05-28 Zeyuan Allen-Zhu , Yuanzhi Li , Zhao Song

Is Stochastic Gradient Descent Near Optimal?

The success of neural networks over the past decade has established them as effective models for many relevant data generating processes. Statistical theory on neural networks indicates graceful scaling of sample complexity. For example,…

Machine Learning · Computer Science 2023-03-28 Yifan Zhu , Hong Jun Jeon , Benjamin Van Roy

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

The generalization mystery of overparametrized deep nets has motivated efforts to understand how gradient descent (GD) converges to low-loss solutions that generalize well. Real-life neural networks are initialized from small random values…

Machine Learning · Computer Science 2021-11-10 Kaifeng Lyu , Zhiyuan Li , Runzhe Wang , Sanjeev Arora

Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent

We prove that two-layer (Leaky)ReLU networks initialized by e.g. the widely used method proposed by He et al. (2015) and trained using gradient descent on a least-squares loss are not universally consistent. Specifically, we describe a…

Machine Learning · Statistics 2022-06-10 David Holzmüller , Ingo Steinwart

Learning ReLU Networks via Alternating Minimization

We propose and analyze a new family of algorithms for training neural networks with ReLU activations. Our algorithms are based on the technique of alternating minimization: estimating the activation patterns of each ReLU for all given…

Machine Learning · Computer Science 2018-10-12 Gauri Jagatap , Chinmay Hegde

When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models?

We study the implicit bias of gradient descent methods in solving a binary classification problem over a linearly separable dataset. The classifier is described by a nonlinear ReLU model and the objective function adopts the exponential…

Machine Learning · Computer Science 2018-10-17 Tengyu Xu , Yi Zhou , Kaiyi Ji , Yingbin Liang

Training a Two Layer ReLU Network Analytically

Neural networks are usually trained with different variants of gradient descent based optimization algorithms such as stochastic gradient descent or the Adam optimizer. Recent theoretical work states that the critical points (where the…

Machine Learning · Computer Science 2024-10-15 Adrian Barbu

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization. We prove that SGD produces neural networks that have classification accuracy…

Machine Learning · Computer Science 2021-02-16 Spencer Frei , Yuan Cao , Quanquan Gu

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a…

Machine Learning · Computer Science 2023-06-26 Martin Hutzenthaler , Arnulf Jentzen , Katharina Pohl , Adrian Riekert , Luca Scarpa

A Convergence Theory for Deep Learning via Over-Parameterization

Deep neural networks (DNNs) have demonstrated dominating performance in many fields; since AlexNet, networks used in practice are going wider and deeper. On the theoretical side, a long line of works has been focusing on training neural…

Machine Learning · Computer Science 2019-06-18 Zeyuan Allen-Zhu , Yuanzhi Li , Zhao Song

Fitting ReLUs via SGD and Quantized SGD

In this paper we focus on the problem of finding the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). These functions are of the form $\mathbf{x}\rightarrow…

Machine Learning · Computer Science 2019-04-02 Seyed Mohammadreza Mousavi Kalan , Mahdi Soltanolkotabi , A. Salman Avestimehr

Decoupling Gating from Linearity

ReLU neural-networks have been in the focus of many recent theoretical works, trying to explain their empirical success. Nonetheless, there is still a gap between current theoretical results and empirical observations, even in the case of…

Machine Learning · Computer Science 2019-06-13 Jonathan Fiat , Eran Malach , Shai Shalev-Shwartz