Related papers: Gradient Estimation with Stochastic Softmax Tricks

Generalized Gumbel-Softmax Gradient Estimator for Generic Discrete Random Variables

Estimating the gradients of stochastic nodes in stochastic computational graphs is one of the crucial research questions in the deep generative modeling community, which enables the gradient descent optimization on neural network…

Machine Learning · Computer Science 2023-02-23 Weonyoung Joo , Dongjun Kim , Seungjae Shin , Il-Chul Moon

Improved Gradient-Based Optimization Over Discrete Distributions

In many applications we seek to maximize an expectation with respect to a distribution over discrete variables. Estimating gradients of such objectives with respect to the distribution parameters is a challenging problem. We analyze…

Machine Learning · Statistics 2019-06-18 Evgeny Andriyash , Arash Vahdat , Bill Macready

Reparameterizable Subset Sampling via Continuous Relaxations

Many machine learning tasks require sampling a subset of items from a collection based on a parameterized distribution. The Gumbel-softmax trick can be used to sample a single item, and allows for low-variance reparameterized gradients with…

Machine Learning · Computer Science 2021-03-02 Sang Michael Xie , Stefano Ermon

Categorical Reparameterization with Gumbel-Softmax

Categorical variables are a natural choice for representing discrete structure in the world. However, stochastic neural networks rarely use categorical latent variables due to the inability to backpropagate through samples. In this work, we…

Machine Learning · Statistics 2017-08-08 Eric Jang , Shixiang Gu , Ben Poole

Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces

Structured latent variables allow incorporating meaningful prior knowledge into deep learning models. However, learning with such variables remains challenging because of their discrete nature. Nowadays, the standard learning approach is to…

Machine Learning · Computer Science 2021-10-29 Kirill Struminsky , Artyom Gadetsky , Denis Rakitin , Danil Karpushkin , Dmitry Vetrov

A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning

The Gumbel-max trick is a method to draw a sample from a categorical distribution, given by its unnormalized (log-)probabilities. Over the past years, the machine learning community has proposed several extensions of this trick to…

Machine Learning · Computer Science 2022-03-09 Iris A. M. Huijben , Wouter Kool , Max B. Paulus , Ruud J. G. van Sloun

Gumbel-softmax-based Optimization: A Simple General Framework for Optimization Problems on Graphs

In computer science, there exist a large number of optimization problems defined on graphs, that is to find a best node state configuration or a network structure such that the designed objective function is optimized under some…

Machine Learning · Computer Science 2020-04-17 Yaoxin Li , Jing Liu , Guozheng Lin , Yueyuan Hou , Muyun Mou , Jiang Zhang

Gradient-based optimization of exact stochastic kinetic models

Stochastic kinetic models describe systems across biology, chemistry, and physics where discrete events and small populations render deterministic approximations inadequate. Parameter inference and inverse design in these systems require…

Computational Physics · Physics 2026-03-06 Francesco Mottes , Qian-Ze Zhu , Michael P. Brenner

On The Statistical Representation Properties Of The Perturb-Softmax And The Perturb-Argmax Probability Distributions

The Gumbel-Softmax probability distribution allows learning discrete tokens in generative learning, while the Gumbel-Argmax probability distribution is useful in learning discrete structures in discriminative learning. Despite the efforts…

Machine Learning · Computer Science 2024-06-05 Hedda Cohen Indelman , Tamir Hazan

Lost Relatives of the Gumbel Trick

The Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function. The method relies on repeatedly applying a random perturbation to the distribution in a particular way, each…

Machine Learning · Statistics 2017-06-14 Matej Balog , Nilesh Tripuraneni , Zoubin Ghahramani , Adrian Weller

Efficient Learning of Discrete-Continuous Computation Graphs

Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more…

Machine Learning · Computer Science 2023-07-27 David Friede , Mathias Niepert

Optimal design of frame structures with mixed categorical and continuous design variables using the Gumbel-Softmax method

In optimizing real-world structures, due to fabrication or budgetary restraints, the design variables may be restricted to a set of standard engineering choices. Such variables, commonly called categorical variables, are discrete and…

Computational Engineering, Finance, and Science · Computer Science 2025-01-03 Mehran Ebrahimi , Hyunmin Cheong , Pradeep Kumar Jayaraman , Farhad Javid

Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax

The Gumbel-Softmax is a continuous distribution over the simplex that is often used as a relaxation of discrete distributions. Because it can be readily interpreted and easily reparameterized, it enjoys widespread use. We propose a modular…

Machine Learning · Statistics 2022-08-30 Andres Potapczynski , Gabriel Loaiza-Ganem , John P. Cunningham

Reliable Categorical Variational Inference with Mixture of Discrete Normalizing Flows

Variational approximations are increasingly based on gradient-based optimization of expectations estimated by sampling. Handling discrete latent variables is then challenging because the sampling process is not differentiable. Continuous…

Machine Learning · Computer Science 2021-02-09 Tomasz Kuśmierczyk , Arto Klami

Gumbel-Softmax Selective Networks

ML models often operate within the context of a larger system that can adapt its response when the ML model is uncertain, such as falling back on safe defaults or a human in the loop. This commonly encountered operational context calls for…

Machine Learning · Computer Science 2022-11-22 Mahmoud Salem , Mohamed Osama Ahmed , Frederick Tung , Gabriel Oliveira

FlexAct: Why Learn when you can Pick?

Learning activation functions has emerged as a promising direction in deep learning, allowing networks to adapt activation mechanisms to task-specific demands. In this work, we introduce a novel framework that employs the Gumbel-Softmax…

Machine Learning · Computer Science 2026-01-13 Ramnath Kumar , Kyle Ritscher , Junmin Judy , Lawrence Liu , Cho-Jui Hsieh

New Tricks for Estimating Gradients of Expectations

We introduce a family of pairwise stochastic gradient estimators for gradients of expectations, which are related to the log-derivative trick, but involve pairwise interactions between samples. The simplest example of our new estimator,…

Machine Learning · Computer Science 2022-04-21 Christian J. Walder , Paul Roussel , Richard Nock , Cheng Soon Ong , Masashi Sugiyama

Gumbel-softmax Optimization: A Simple General Framework for Combinatorial Optimization Problems on Graphs

Many problems in real life can be converted to combinatorial optimization problems (COPs) on graphs, that is to find a best node state configuration or a network structure such that the designed objective function is optimized under some…

Machine Learning · Computer Science 2019-09-17 Jing Liu , Fei Gao , Jiang Zhang

Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator

Gradient estimation in models with discrete latent variables is a challenging problem, because the simplest unbiased estimators tend to have high variance. To counteract this, modern estimators either introduce bias, rely on multiple…

Machine Learning · Statistics 2020-10-13 Max B. Paulus , Chris J. Maddison , Andreas Krause

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Softmax is widely used in neural networks for multiclass classification, gate structure and attention mechanisms. The statistical assumption that the input is normal distributed supports the gradient stability of Softmax. However, when used…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Shulun Wang , Bin Liu , Feng Liu