Related papers: Gradient Estimation with Stochastic Softmax Tricks
Estimating the gradients of stochastic nodes in stochastic computational graphs is one of the crucial research questions in the deep generative modeling community, which enables the gradient descent optimization on neural network…
In many applications we seek to maximize an expectation with respect to a distribution over discrete variables. Estimating gradients of such objectives with respect to the distribution parameters is a challenging problem. We analyze…
Many machine learning tasks require sampling a subset of items from a collection based on a parameterized distribution. The Gumbel-softmax trick can be used to sample a single item, and allows for low-variance reparameterized gradients with…
Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpropagate through samples. In this work, we…
Structured latent variables allow incorporating meaningful prior knowledge into deep learning models. However, learning with such variables remains challenging because of their discrete nature. Nowadays, the standard learning approach is to…
The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by its unnormalized (log-)probabilities. Over the past years, the machine learning community has proposed several extensions of this trick to…
In computer science, there exist a large number of optimization problems defined on graphs, that is to find a best node state configuration or a network structure such that the designed objective function is optimized under some…
Stochastic kinetic models describe systems across biology, chemistry, and physics where discrete events and small populations render deterministic approximations inadequate. Parameter inference and inverse design in these systems require…
The Gumbel-Softmax probability distribution allows learning discrete tokens in generative learning, while the Gumbel-Argmax probability distribution is useful in learning discrete structures in discriminative learning. Despite the efforts…
The Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function. The method relies on repeatedly applying a random perturbation to the distribution in a particular way, each…
Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more…
In optimizing real-world structures, due to fabrication or budgetary restraints, the design variables may be restricted to a set of standard engineering choices. Such variables, commonly called categorical variables, are discrete and…
The Gumbel-Softmax is a continuous distribution over the simplex that is often used as a relaxation of discrete distributions. Because it can be readily interpreted and easily reparameterized, it enjoys widespread use. We propose a modular…
Variational approximations are increasingly based on gradient-based optimization of expectations estimated by sampling. Handling discrete latent variables is then challenging because the sampling process is not differentiable. Continuous…
ML models often operate within the context of a larger system that can adapt its response when the ML model is uncertain, such as falling back on safe defaults or a human in the loop. This commonly encountered operational context calls for…
Learning activation functions has emerged as a promising direction in deep learning, allowing networks to adapt activation mechanisms to task-specific demands. In this work, we introduce a novel framework that employs the Gumbel-Softmax…
We introduce a family of pairwise stochastic gradient estimators for gradients of expectations, which are related to the log-derivative trick, but involve pairwise interactions between samples. The simplest example of our new estimator,…
Many problems in real life can be converted to combinatorial optimization problems (COPs) on graphs, that is to find a best node state configuration or a network structure such that the designed objective function is optimized under some…
Gradient estimation in models with discrete latent variables is a challenging problem, because the simplest unbiased estimators tend to have high variance. To counteract this, modern estimators either introduce bias, rely on multiple…
Softmax is widely used in neural networks for multiclass classification, gate structure and attention mechanisms. The statistical assumption that the input is normal distributed supports the gradient stability of Softmax. However, when used…