English
Related papers

Related papers: Depthwise Discrete Representation Learning

200 papers

Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector…

Machine Learning · Computer Science 2018-05-31 Aaron van den Oord , Oriol Vinyals , Koray Kavukcuoglu

Neural latent variable models enable the discovery of interesting structure in speech audio data. This paper presents a comparison of two different approaches which are broadly based on predicting future time-steps or auto-encoding the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-28 Henry Zhou , Alexei Baevski , Michael Auli

Deep neural networks with discrete latent variables offer the promise of better symbolic reasoning, and learning abstractions that are more useful to new tasks. There has been a surge in interest in discrete latent variable models, however,…

Machine Learning · Computer Science 2018-07-23 Aurko Roy , Ashish Vaswani , Arvind Neelakantan , Niki Parmar

In this paper we demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs). Discrete latent variable models have been shown to learn nontrivial…

Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks. Inspired by the seminal Vector Quantized Variational Auto-Encoder (VQ-VAE),…

Machine Learning · Computer Science 2023-06-21 Tung-Long Vuong , Trung Le , He Zhao , Chuanxia Zheng , Mehrtash Harandi , Jianfei Cai , Dinh Phung

Disentangled representation learning aims to represent the underlying generative factors of a dataset in a latent representation independently of one another. In our work, we propose a discrete variational autoencoder (VAE) based model…

Computer Vision and Pattern Recognition · Computer Science 2025-11-06 Gulcin Baykal , Melih Kandemir , Gozde Unal

Existing neural architecture representation learning methods focus on continuous representation learning, typically using Variational Autoencoders (VAEs) to map discrete architectures onto a continuous Gaussian distribution. However,…

Recently there has been an increased interest in unsupervised learning of disentangled representations using the Variational Autoencoder (VAE) framework. Most of the existing work has focused largely on modifying the variational cost…

Machine Learning · Statistics 2019-09-12 Jan Stühmer , Richard E. Turner , Sebastian Nowozin

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content…

Machine Learning · Computer Science 2019-09-12 Jan Chorowski , Ron J. Weiss , Samy Bengio , Aäron van den Oord

Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective…

Sound · Computer Science 2020-07-29 Siddique Latif , Rajib Rana , Junaid Qadir , Julien Epps

Latent variable models like the Variational Auto-Encoder (VAE) are commonly used to learn representations of images. However, for downstream tasks like semantic classification, the representations learned by VAE are less competitive than…

Machine Learning · Statistics 2022-05-31 Mingtian Zhang , Tim Z. Xiao , Brooks Paige , David Barber

Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear.…

Machine Learning · Computer Science 2023-07-27 David Friede , Christian Reimers , Heiner Stuckenschmidt , Mathias Niepert

Vector Quantized-Variational AutoEncoders (VQ-VAE) are generative models based on discrete latent representations of the data, where inputs are mapped to a finite set of learned embeddings.To generate new samples, an autoregressive prior…

Machine Learning · Statistics 2022-08-04 Max Cohen , Guillaume Quispe , Sylvain Le Corff , Charles Ollion , Eric Moulines

An ability to model a generative process and learn a latent representation for speech in an unsupervised fashion will be crucial to process vast quantities of unlabelled speech data. Recently, deep probabilistic generative models such as…

Computation and Language · Computer Science 2017-09-25 Wei-Ning Hsu , Yu Zhang , James Glass

Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful representation learning framework that can discover discrete groups of features from a speech signal without supervision. Until now, the VQ-VAE architecture has previously…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Yi Zhao , Haoyu Li , Cheng-I Lai , Jennifer Williams , Erica Cooper , Junichi Yamagishi

Vector quantization (VQ) transforms continuous image features into discrete representations, providing compressed, tokenized inputs for generative models. However, VQ-based frameworks suffer from several issues, such as non-smooth latent…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Sicheng Yang , Xing Hu , Qiang Wu , Dawei Yang

Variational Autoencoders (VAEs) are well-established as a principled approach to probabilistic unsupervised learning with neural networks. Typically, an encoder network defines the parameters of a Gaussian distributed latent space from…

Machine Learning · Computer Science 2025-05-16 Alan Jeffares , Liyuan Liu

We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder…

Sound · Computer Science 2023-09-15 Marek Strong , Jonas Rohnke , Antonio Bonafonte , Mateusz Łajszczak , Trevor Wood

The human perception system is often assumed to recruit motor knowledge when processing auditory speech inputs. Using articulatory modeling and deep learning, this study examines how this articulatory information can be used for discovering…

Computation and Language · Computer Science 2022-06-20 Marc-Antoine Georges , Jean-Luc Schwartz , Thomas Hueber

Learning compact and meaningful latent space representations has been shown to be very useful in generative modeling tasks for visual data. One particular example is applying Vector Quantization (VQ) in variational autoencoders (VQ-VAEs,…

Machine Learning · Computer Science 2024-09-18 Xin Li , Anand Sarwate
‹ Prev 1 2 3 10 Next ›