Related papers: Depthwise Discrete Representation Learning

Neural Discrete Representation Learning

Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector…

Machine Learning · Computer Science 2018-05-31 Aaron van den Oord , Oriol Vinyals , Koray Kavukcuoglu

A Comparison of Discrete Latent Variable Models for Speech Representation Learning

Neural latent variable models enable the discovery of interesting structure in speech audio data. This paper presents a comparison of two different approaches which are broadly based on predicting future time-steps or auto-encoding the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-28 Henry Zhou , Alexei Baevski , Michael Auli

Theory and Experiments on Vector Quantized Autoencoders

Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks. There has been a surge in interest in discrete latent variable models, however,…

Machine Learning · Computer Science 2018-07-23 Aurko Roy , Ashish Vaswani , Arvind Neelakantan , Niki Parmar

Robust Training of Vector Quantized Bottleneck Models

In this paper we demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs). Discrete latent variable models have been shown to learn nontrivial…

Machine Learning · Computer Science 2024-09-13 Adrian Łańcucki , Jan Chorowski , Guillaume Sanchez , Ricard Marxer , Nanxin Chen , Hans J. G. A. Dolfing , Sameer Khurana , Tanel Alumäe , Antoine Laurent

Vector Quantized Wasserstein Auto-Encoder

Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks. Inspired by the seminal Vector Quantized Variational Auto-Encoder (VQ-VAE),…

Machine Learning · Computer Science 2023-06-21 Tung-Long Vuong , Trung Le , He Zhao , Chuanxia Zheng , Mehrtash Harandi , Jianfei Cai , Dinh Phung

Disentanglement with Factor Quantized Variational Autoencoders

Disentangled representation learning aims to represent the underlying generative factors of a dataset in a latent representation independently of one another. In our work, we propose a discrete variational autoencoder (VAE) based model…

Computer Vision and Pattern Recognition · Computer Science 2025-11-06 Gulcin Baykal , Melih Kandemir , Gozde Unal

Arch-VQ: Discrete Architecture Representation Learning with Autoregressive Priors

Existing neural architecture representation learning methods focus on continuous representation learning, typically using Variational Autoencoders (VAEs) to map discrete architectures onto a continuous Gaussian distribution. However,…

Machine Learning · Computer Science 2026-03-19 Deshani Geethika Poddenige , Sachith Seneviratne , Asela Hevapathige , Damith Senanayake , Mahesan Niranjan , PN Suganthan , Saman Halgamuge

Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations

Recently there has been an increased interest in unsupervised learning of disentangled representations using the Variational Autoencoder (VAE) framework. Most of the existing work has focused largely on modifying the variational cost…

Machine Learning · Statistics 2019-09-12 Jan Stühmer , Richard E. Turner , Sebastian Nowozin

Unsupervised speech representation learning using WaveNet autoencoders

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content…

Machine Learning · Computer Science 2019-09-12 Jan Chorowski , Ron J. Weiss , Samy Bengio , Aäron van den Oord

Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study

Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective…

Sound · Computer Science 2020-07-29 Siddique Latif , Rajib Rana , Junaid Qadir , Julien Epps

Improving VAE-based Representation Learning

Latent variable models like the Variational Auto-Encoder (VAE) are commonly used to learn representations of images. However, for downstream tasks like semantic classification, the representations learned by VAE are less competitive than…

Machine Learning · Statistics 2022-05-31 Mingtian Zhang , Tim Z. Xiao , Brooks Paige , David Barber

Learning Disentangled Discrete Representations

Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear.…

Machine Learning · Computer Science 2023-07-27 David Friede , Christian Reimers , Heiner Stuckenschmidt , Mathias Niepert

Diffusion bridges vector quantized Variational AutoEncoders

Vector Quantized-Variational AutoEncoders (VQ-VAE) are generative models based on discrete latent representations of the data, where inputs are mapped to a finite set of learned embeddings.To generate new samples, an autoregressive prior…

Machine Learning · Statistics 2022-08-04 Max Cohen , Guillaume Quispe , Sylvain Le Corff , Charles Ollion , Eric Moulines

Learning Latent Representations for Speech Generation and Transformation

An ability to model a generative process and learn a latent representation for speech in an unsupervised fashion will be crucial to process vast quantities of unlabelled speech data. Recently, deep probabilistic generative models such as…

Computation and Language · Computer Science 2017-09-25 Wei-Ning Hsu , Yu Zhang , James Glass

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful representation learning framework that can discover discrete groups of features from a speech signal without supervision. Until now, the VQ-VAE architecture has previously…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Yi Zhao , Haoyu Li , Cheng-I Lai , Jennifer Williams , Erica Cooper , Junichi Yamagishi

VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling

Vector quantization (VQ) transforms continuous image features into discrete representations, providing compressed, tokenized inputs for generative models. However, VQ-based frameworks suffer from several issues, such as non-smooth latent…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Sicheng Yang , Xing Hu , Qiang Wu , Dawei Yang

An Introduction to Discrete Variational Autoencoders

Variational Autoencoders (VAEs) are well-established as a principled approach to probabilistic unsupervised learning with neural networks. Typically, an encoder network defines the parameters of a Gaussian distributed latent space from…

Machine Learning · Computer Science 2025-05-16 Alan Jeffares , Liyuan Liu

Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech

We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder…

Sound · Computer Science 2023-09-15 Marek Strong , Jonas Rohnke , Antonio Bonafonte , Mateusz Łajszczak , Trevor Wood

Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE

The human perception system is often assumed to recruit motor knowledge when processing auditory speech inputs. Using articulatory modeling and deep learning, this study examines how this articulatory information can be used for discovering…

Computation and Language · Computer Science 2022-06-20 Marc-Antoine Georges , Jean-Luc Schwartz , Thomas Hueber

LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling

Learning compact and meaningful latent space representations has been shown to be very useful in generative modeling tasks for visual data. One particular example is applying Vector Quantization (VQ) in variational autoencoders (VQ-VAEs,…

Machine Learning · Computer Science 2024-09-18 Xin Li , Anand Sarwate