Related papers: Learning a Single Neuron with Gradient Methods

Learning a Single Neuron for Non-monotonic Activation Functions

We study the problem of learning a single neuron $\mathbf{x}\mapsto \sigma(\mathbf{w}^T\mathbf{x})$ with gradient descent (GD). All the existing positive results are limited to the case where $\sigma$ is monotonic. However, it is recently…

Machine Learning · Statistics 2022-02-17 Lei Wu

Learning a Single Neuron with Bias Using Gradient Descent

We theoretically study the fundamental problem of learning a single neuron with a bias term ($\mathbf{x} \mapsto \sigma(<\mathbf{w},\mathbf{x}> + b)$) in the realizable setting with the ReLU activation, using gradient descent. Perhaps…

Machine Learning · Computer Science 2022-02-08 Gal Vardi , Gilad Yehudai , Ohad Shamir

Learning a Single Neuron with Adversarial Label Noise via Gradient Descent

We study the fundamental problem of learning a single neuron, i.e., a function of the form $\mathbf{x}\mapsto\sigma(\mathbf{w}\cdot\mathbf{x})$ for monotone activations $\sigma:\mathbb{R}\mapsto\mathbb{R}$, with respect to the $L_2^2$-loss…

Machine Learning · Computer Science 2022-06-20 Ilias Diakonikolas , Vasilis Kontonis , Christos Tzamos , Nikos Zarifis

Distribution-Specific Hardness of Learning Neural Networks

Although neural networks are routinely and successfully trained in practice using simple gradient-based methods, most existing theoretical results are negative, showing that learning such networks is difficult, in a worst-case sense over…

Machine Learning · Computer Science 2017-03-13 Ohad Shamir

Gradient learning in spiking neural networks by dynamic perturbation of conductances

We present a method of estimating the gradient of an objective function with respect to the synaptic weights of a spiking neural network. The method works by measuring the fluctuations in the objective function in response to dynamic…

Neurons and Cognition · Quantitative Biology 2007-05-23 Ila R. Fiete , H. Sebastian Seung

Agnostic Learning of a Single Neuron with Gradient Descent

We consider the problem of learning the best-fitting single neuron as measured by the expected square loss $\mathbb{E}_{(x,y)\sim \mathcal{D}}[(\sigma(w^\top x)-y)^2]$ over some unknown joint distribution $\mathcal{D}$ by using gradient…

Machine Learning · Computer Science 2020-09-01 Spencer Frei , Yuan Cao , Quanquan Gu

On the Complexity of Learning Neural Networks

The stunning empirical successes of neural networks currently lack rigorous theoretical explanation. What form would such an explanation take, in the face of existing complexity-theoretic lower bounds? A first step might be to show that…

Machine Learning · Computer Science 2017-07-18 Le Song , Santosh Vempala , John Wilmes , Bo Xie

On the Power and Limitations of Random Features for Understanding Neural Networks

Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error). The key insight is that with sufficient…

Machine Learning · Computer Science 2022-03-01 Gilad Yehudai , Ohad Shamir

Estimating or Propagating Gradients Through Stochastic Neurons

Stochastic neurons can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic neurons, i.e.,…

Machine Learning · Computer Science 2013-05-15 Yoshua Bengio

Learning Two-layer Neural Networks with Symmetric Inputs

We give a new algorithm for learning a two-layer neural network under a general class of input distributions. Assuming there is a ground-truth two-layer network $$ y = A \sigma(Wx) + \xi, $$ where $A,W$ are weight matrices, $\xi$ represents…

Machine Learning · Computer Science 2019-02-05 Rong Ge , Rohith Kuditipudi , Zhize Li , Xiang Wang

Learning One-hidden-layer neural networks via Provable Gradient Descent with Random Initialization

Although deep learning has shown its powerful performance in many applications, the mathematical principles behind neural networks are still mysterious. In this paper, we consider the problem of learning a one-hidden-layer neural network…

Machine Learning · Computer Science 2019-07-17 Shuhao Xia , Yuanming Shi

Topological Invariance and Breakdown in Learning

We prove that for a broad class of permutation-equivariant learning rules (including SGD, Adam, and others), the training process induces a bi-Lipschitz mapping between neurons and strongly constrains the topology of the neuron distribution…

Machine Learning · Computer Science 2025-10-06 Yongyi Yang , Tomaso Poggio , Isaac Chuang , Liu Ziyin

Variational Neural Networks: Every Layer and Neuron Can Be Unique

The choice of activation function can significantly influence the performance of neural networks. The lack of guiding principles for the selection of activation function is lamentable. We try to address this issue by introducing our…

Machine Learning · Computer Science 2018-10-16 Yiwei Li , Enzhi Li

Quadratic number of nodes is sufficient to learn a dataset via gradient descent

We prove that if an activation function satisfies some mild conditions and number of neurons in a two-layered fully connected neural network with this activation function is beyond a certain threshold, then gradient descent on quadratic…

Optimization and Control · Mathematics 2019-11-14 Biswarup Das , Eugene. A. Golikov

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of…

Machine Learning · Computer Science 2016-01-13 Majid Janzamin , Hanie Sedghi , Anima Anandkumar

Gaussian Process Neurons Learn Stochastic Activation Functions

We propose stochastic, non-parametric activation functions that are fully learnable and individual to each neuron. Complexity and the risk of overfitting are controlled by placing a Gaussian process prior over these functions. The result is…

Machine Learning · Statistics 2017-12-01 Sebastian Urban , Marcus Basalla , Patrick van der Smagt

Breaking the Conventional Forward-Backward Tie in Neural Networks: Activation Functions

Gradient-based neural network training traditionally enforces symmetry between forward and backward propagation, requiring activation functions to be differentiable (or sub-differentiable) and strictly monotonic in certain regions to…

Neural and Evolutionary Computing · Computer Science 2025-09-10 Luigi Troiano , Francesco Gissi , Vincenzo Benedetto , Genny Tortora

Using Linear Regression for Iteratively Training Neural Networks

We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the…

Machine Learning · Computer Science 2023-07-17 Harshad Khadilkar

Learning One-hidden-layer ReLU Networks via Gradient Descent

We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher…

Machine Learning · Statistics 2018-06-21 Xiao Zhang , Yaodong Yu , Lingxiao Wang , Quanquan Gu

Training Feedforward Neural Networks with Standard Logistic Activations is Feasible

Training feedforward neural networks with standard logistic activations is considered difficult because of the intrinsic properties of these sigmoidal functions. This work aims at showing that these networks can be trained to achieve…

Neural and Evolutionary Computing · Computer Science 2017-10-04 Emanuele Sansone , Francesco G. B. De Natale