Related papers: An Approximation Algorithm for training One-Node R…

Practical Convex Formulation of Robust One-hidden-layer Neural Network Training

Recent work has shown that the training of a one-hidden-layer, scalar-output fully-connected ReLU neural network can be reformulated as a finite-dimensional convex program. Unfortunately, the scale of such a convex program grows…

Machine Learning · Computer Science 2021-05-27 Yatong Bai , Tanmay Gautam , Yu Gai , Somayeh Sojoudi

Polynomial-Time Solutions for ReLU Network Training: A Complexity Classification via Max-Cut and Zonotopes

We investigate the complexity of training a two-layer ReLU neural network with weight decay regularization. Previous research has shown that the optimal solution of this problem can be found by solving a standard cone-constrained convex…

Machine Learning · Computer Science 2023-11-21 Yifei Wang , Mert Pilanci

Learning One-hidden-layer ReLU Networks via Gradient Descent

We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher…

Machine Learning · Statistics 2018-06-21 Xiao Zhang , Yaodong Yu , Lingxiao Wang , Quanquan Gu

Tight Hardness Results for Training Depth-2 ReLU Networks

We prove several hardness results for training depth-2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural…

Machine Learning · Computer Science 2020-11-30 Surbhi Goel , Adam Klivans , Pasin Manurangsi , Daniel Reichman

Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks

We study the sample complexity of learning one-hidden-layer convolutional neural networks (CNNs) with non-overlapping filters. We propose a novel algorithm called approximate gradient descent for training CNNs, and show that, with high…

Machine Learning · Computer Science 2019-11-13 Yuan Cao , Quanquan Gu

Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model

We derive approximation bounds for learning single neuron models using thresholded gradient descent when both the labels and the covariates are possibly corrupted adversarially. We assume the data follows the model $y =…

Machine Learning · Statistics 2024-09-06 Arvind Rathnashyam , Alex Gittens

The Computational Complexity of Training ReLU(s)

We consider the computational complexity of training depth-2 neural networks composed of rectified linear units (ReLUs). We show that, even for the case of a single ReLU, finding a set of weights that minimizes the squared error (even…

Computational Complexity · Computer Science 2018-10-17 Pasin Manurangsi , Daniel Reichman

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

We study the problem of training deep neural networks with Rectified Linear Unit (ReLU) activation function using gradient descent and stochastic gradient descent. In particular, we study the binary classification problem and show that for…

Machine Learning · Computer Science 2018-12-31 Difan Zou , Yuan Cao , Dongruo Zhou , Quanquan Gu

Learning Narrow One-Hidden-Layer ReLU Networks

We consider the well-studied problem of learning a linear combination of $k$ ReLU activations with respect to a Gaussian distribution on inputs in $d$ dimensions. We give the first polynomial-time algorithm that succeeds whenever $k$ is a…

Machine Learning · Computer Science 2023-04-21 Sitan Chen , Zehao Dou , Surbhi Goel , Adam R Klivans , Raghu Meka

Neural Network Approximation

Neural Networks (NNs) are the method of choice for building learning algorithms. Their popularity stems from their empirical success on several challenging learning problems. However, most scholars agree that a convincing theoretical…

Numerical Analysis · Mathematics 2021-01-01 Ronald DeVore , Boris Hanin , Guergana Petrova

Complexity of Training ReLU Neural Network

In this paper, we explore some basic questions on the complexity of training neural networks with ReLU activation function. We show that it is NP-hard to train a two-hidden layer feedforward ReLU neural network. If dimension of the input…

Computational Complexity · Computer Science 2020-11-05 Digvijay Boob , Santanu S. Dey , Guanghui Lan

Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time

In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem…

Machine Learning · Computer Science 2024-07-15 Sungyoon Kim , Mert Pilanci

Training (Overparametrized) Neural Networks in Near-Linear Time

The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster $\mathit{second}$-$\mathit{order}$ optimization algorithms…

Machine Learning · Computer Science 2020-12-10 Jan van den Brand , Binghui Peng , Zhao Song , Omri Weinstein

Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization

Neural networks with REctified Linear Unit (ReLU) activation functions (a.k.a. ReLU networks) have achieved great empirical success in various domains. Nonetheless, existing results for learning ReLU networks either pose assumptions on the…

Machine Learning · Statistics 2019-05-01 Gang Wang , Georgios B. Giannakis , Jie Chen

Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals

We consider the problem of learning an arbitrarily-biased ReLU activation (or neuron) over Gaussian marginals with the squared loss objective. Despite the ReLU neuron being the basic building block of modern neural networks, we still do not…

Machine Learning · Computer Science 2026-02-04 Anxin Guo , Aravindan Vijayaraghavan

Approximating Activation Functions

ReLU is widely seen as the default choice for activation functions in neural networks. However, there are cases where more complicated functions are required. In particular, recurrent neural networks (such as LSTMs) make extensive use of…

Machine Learning · Computer Science 2020-01-20 Nicholas Gerard Timmons , Andrew Rice

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a…

Machine Learning · Computer Science 2023-06-26 Martin Hutzenthaler , Arnulf Jentzen , Katharina Pohl , Adrian Riekert , Luca Scarpa

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies…

Machine Learning · Computer Science 2019-02-06 Simon S. Du , Xiyu Zhai , Barnabas Poczos , Aarti Singh

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation via gradient descent (GD) type optimization schemes is nowadays a common industrially relevant procedure. Till this day in the scientific…

Machine Learning · Computer Science 2023-04-13 Simon Eberle , Arnulf Jentzen , Adrian Riekert , Georg S. Weiss

Optimization over Trained Neural Networks: Going Large with Gradient-Based Algorithms

When optimizing a nonlinear objective, one can employ a neural network as a surrogate for the nonlinear function. However, the resulting optimization model can be time-consuming to solve globally with exact methods. As a result, local…

Optimization and Control · Mathematics 2026-03-19 Jiatai Tong , Yilin Zhu , Thiago Serra , Samuel Burer