Related papers: Path Sample-Analytic Gradient Estimators for Stoch…

Estimating or Propagating Gradients Through Stochastic Neurons

Stochastic neurons can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic neurons, i.e.,…

Machine Learning · Computer Science 2013-05-15 Yoshua Bengio

Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks

Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical…

Machine Learning · Statistics 2021-10-20 Alexander Shekhovtsov , Viktor Yanush

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such…

Machine Learning · Computer Science 2013-08-16 Yoshua Bengio , Nicholas Léonard , Aaron Courville

AdaSTE: An Adaptive Straight-Through Estimator to Train Binary Neural Networks

We propose a new algorithm for training deep neural networks (DNNs) with binary weights. In particular, we first cast the problem of training binary neural networks (BiNNs) as a bilevel optimization instance and subsequently construct…

Machine Learning · Computer Science 2021-12-07 Huu Le , Rasmus Kjær Høier , Che-Tsung Lin , Christopher Zach

Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples

Self-paced learning and hard example mining re-weight training instances to improve learning accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD):…

Machine Learning · Statistics 2018-01-09 Haw-Shiuan Chang , Erik Learned-Miller , Andrew McCallum

Bias-Variance Tradeoffs in Single-Sample Binary Gradient Estimators

Discrete and especially binary random variables occur in many machine learning models, notably in variational autoencoders with binary latent states and in stochastic binary networks. When learning such models, a key tool is an estimator of…

Machine Learning · Computer Science 2021-10-18 Alexander Shekhovtsov

Noise Optimization for Artificial Neural Networks

Adding noises to artificial neural network(ANN) has been shown to be able to improve robustness in previous work. In this work, we propose a new technique to compute the pathwise stochastic gradient estimate with respect to the standard…

Machine Learning · Computer Science 2021-02-10 Li Xiao , Zeliang Zhang , Yijie Peng

Probabilistic Binary Neural Networks

Low bit-width weights and activations are an effective way of combating the increasing need for both memory and compute power of Deep Neural Networks. In this work, we present a probabilistic training method for Neural Network with both…

Machine Learning · Computer Science 2018-09-11 Jorn W. T. Peters , Max Welling

Progressive Stochastic Binarization of Deep Networks

A plethora of recent research has focused on improving the memory footprint and inference speed of deep networks by reducing the complexity of (i) numerical representations (for example, by deterministic or stochastic quantization) and (ii)…

Machine Learning · Computer Science 2019-04-05 David Hartmann , Michael Wand

On the role of synaptic stochasticity in training low-precision neural networks

Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights…

Disordered Systems and Neural Networks · Physics 2018-07-04 Carlo Baldassi , Federica Gerace , Hilbert J. Kappen , Carlo Lucibello , Luca Saglietti , Enzo Tartaglione , Riccardo Zecchina

Perturbative estimation of stochastic gradients

In this paper we introduce a family of stochastic gradient estimation techniques based of the perturbative expansion around the mean of the sampling distribution. We characterize the bias and variance of the resulting Taylor-corrected…

Machine Learning · Statistics 2019-11-18 Luca Ambrogioni , Marcel A. J. van Gerven

Gradient Estimation Using Stochastic Computation Graphs

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external…

Machine Learning · Computer Science 2016-01-06 John Schulman , Nicolas Heess , Theophane Weber , Pieter Abbeel

Binary stochasticity enabled highly efficient neuromorphic deep learning achieves better-than-software accuracy

Deep learning needs high-precision handling of forwarding signals, backpropagating errors, and updating weights. This is inherently required by the learning algorithm since the gradient descent learning rule relies on the chain product of…

Neural and Evolutionary Computing · Computer Science 2024-12-30 Yang Li , Wei Wang , Ming Wang , Chunmeng Dou , Zhengyu Ma , Huihui Zhou , Peng Zhang , Nicola Lepri , Xumeng Zhang , Qing Luo , Xiaoxin Xu , Guanhua Yang , Feng Zhang , Ling Li , Daniele Ielmini , Ming Liu

A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization

A number of optimization approaches have been proposed for optimizing nonconvex objectives (e.g. deep learning models), such as batch gradient descent, stochastic gradient descent and stochastic variance reduced gradient descent. Theory…

Machine Learning · Computer Science 2019-05-15 Jia Bi , Steve R. Gunn

Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction

We study online inference and asymptotic covariance estimation for the stochastic gradient descent (SGD) algorithm. While classical methods (such as plug-in and batch-means estimators) are available, they either require inaccessible…

Machine Learning · Statistics 2026-04-24 Ziyang Wei , Wanrong Zhu , Jingyang Lyu , Wei Biao Wu

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear…

Machine Learning · Computer Science 2021-12-08 Scott Pesme , Loucas Pillaud-Vivien , Nicolas Flammarion

Bolstering Stochastic Gradient Descent with Model Building

Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are…

Machine Learning · Computer Science 2024-03-14 S. Ilker Birbil , Ozgur Martin , Gonenc Onay , Figen Oztoprak

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Stochastic gradient descent (SGD), which dates back to the 1950s, is one of the most popular and effective approaches for performing stochastic optimization. Research on SGD resurged recently in machine learning for optimizing convex loss…

Machine Learning · Computer Science 2019-12-24 Jie Chen , Ronny Luss

Injecting Logical Constraints into Neural Networks via Straight-Through Estimators

Injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. We find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be…

Artificial Intelligence · Computer Science 2023-07-11 Zhun Yang , Joohyung Lee , Chiyoun Park

Training Neural Networks with Optimal Double-Bayesian Learning

Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely…

Machine Learning · Computer Science 2026-05-20 Vy Bui , Hang Yu , Karthik Kantipudi , Ziv Yaniv , Stefan Jaeger