Related papers: Variable-rate discrete representation learning

Learning Latent Representations for Speech Generation and Transformation

An ability to model a generative process and learn a latent representation for speech in an unsupervised fashion will be crucial to process vast quantities of unlabelled speech data. Recently, deep probabilistic generative models such as…

Computation and Language · Computer Science 2017-09-25 Wei-Ning Hsu , Yu Zhang , James Glass

Unsupervised speech representation learning using WaveNet autoencoders

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content…

Machine Learning · Computer Science 2019-09-12 Jan Chorowski , Ron J. Weiss , Samy Bengio , Aäron van den Oord

Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study

Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective…

Sound · Computer Science 2020-07-29 Siddique Latif , Rajib Rana , Junaid Qadir , Julien Epps

Text Modeling with Syntax-Aware Variational Autoencoders

Syntactic information contains structures and rules about how text sentences are arranged. Incorporating syntax into text modeling methods can potentially benefit both representation learning and generation. Variational autoencoders (VAEs)…

Computation and Language · Computer Science 2019-08-28 Yijun Xiao , William Yang Wang

Learning Robust Latent Representations for Controllable Speech Synthesis

State-of-the-art Variational Auto-Encoders (VAEs) for learning disentangled latent representations give impressive results in discovering features like pitch, pause duration, and accent in speech data, leading to highly controllable…

Sound · Computer Science 2021-05-11 Shakti Kumar , Jithin Pradeep , Hussain Zaidi

Improve Variational Autoencoder for Text Generationwith Discrete Latent Bottleneck

Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. However, the sequential text generation common pitfall with VAEs is that the model tends to ignore latent variables with a strong auto-regressive…

Machine Learning · Computer Science 2021-02-26 Yang Zhao , Ping Yu , Suchismit Mahapatra , Qinliang Su , Changyou Chen

Interpretable Sentence Representation with Variational Autoencoders and Attention

In this thesis, we develop methods to enhance the interpretability of recent representation learning techniques in natural language processing (NLP) while accounting for the unavailability of annotated data. We choose to leverage…

Computation and Language · Computer Science 2023-05-05 Ghazi Felhi

Depthwise Discrete Representation Learning

Recent advancements in learning Discrete Representations as opposed to continuous ones have led to state of art results in tasks that involve Language, Audio and Vision. Some latent factors such as words, phonemes and shapes are better…

Machine Learning · Computer Science 2020-04-14 Iordanis Fostiropoulos

An Attribute-Aligned Strategy for Learning Speech Representation

Advancement in speech technology has brought convenience to our life. However, the concern is on the rise as speech signal contains multiple personal attributes, which would lead to either sensitive information leakage or bias toward…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-09 Yu-Lin Huang , Bo-Hao Su , Y. -W. Peter Hong , Chi-Chun Lee

Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder

In this work we present an unsupervised approach to summarize sentences in abstractive way using Variational Autoencoder (VAE). VAE are known to learn a semantically rich latent variable, representing high dimensional input. VAEs are…

Computation and Language · Computer Science 2018-09-24 Raphael Schumann

A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model,…

Sound · Computer Science 2021-06-15 Xiaoyu Bie , Laurent Girin , Simon Leglaive , Thomas Hueber , Xavier Alameda-Pineda

A Discrete CVAE for Response Generation on Short-Text Conversation

Neural conversation models such as encoder-decoder models are easy to generate bland and generic responses. Some researchers propose to use the conditional variational autoencoder(CVAE) which maximizes the lower bound on the conditional…

Computation and Language · Computer Science 2019-11-25 Jun Gao , Wei Bi , Xiaojiang Liu , Junhui Li , Guodong Zhou , Shuming Shi

An Introduction to Discrete Variational Autoencoders

Variational Autoencoders (VAEs) are well-established as a principled approach to probabilistic unsupervised learning with neural networks. Typically, an encoder network defines the parameters of a Gaussian distributed latent space from…

Machine Learning · Computer Science 2025-05-16 Alan Jeffares , Liyuan Liu

Predictive variational autoencoder for learning robust representations of time-series data

Variational autoencoders (VAEs) have been used extensively to discover low-dimensional latent factors governing neural activity and animal behavior. However, without careful model selection, the uncovered latent factors may reflect noise in…

Machine Learning · Computer Science 2023-12-13 Julia Huiming Wang , Dexter Tsin , Tatiana Engel

Towards Transferable Speech Emotion Representation: On loss functions for cross-lingual latent representations

In recent years, speech emotion recognition (SER) has been used in wide ranging applications, from healthcare to the commercial sector. In addition to signal processing approaches, methods for SER now also use deep learning techniques which…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-29 Sneha Das , Nicole Nadine Lønfeldt , Anne Katrine Pagsberg , Line H. Clemmensen

Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders

While sparse autoencoders (SAEs) successfully extract interpretable features from language models, applying them to audio generation faces unique challenges: audio's dense nature requires compression that obscures semantic meaning, and…

Machine Learning · Computer Science 2025-10-31 Nathan Paek , Yongyi Zang , Qihui Yang , Randal Leistikow

Dynamical Variational Autoencoders: A Comprehensive Review

Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, the input data…

Machine Learning · Computer Science 2022-07-05 Laurent Girin , Simon Leglaive , Xiaoyu Bie , Julien Diard , Thomas Hueber , Xavier Alameda-Pineda

A Comparison of Discrete Latent Variable Models for Speech Representation Learning

Neural latent variable models enable the discovery of interesting structure in speech audio data. This paper presents a comparison of two different approaches which are broadly based on predicting future time-steps or auto-encoding the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-28 Henry Zhou , Alexei Baevski , Michael Auli

Autoregressive Co-Training for Learning Discrete Speech Representations

While several self-supervised approaches for learning discrete speech representation have been proposed, it is unclear how these seemingly similar approaches relate to each other. In this paper, we consider a generative model with discrete…

Computation and Language · Computer Science 2022-11-01 Sung-Lin Yeh , Hao Tang

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech features such as spectrograms are often used as the input for the…

Computation and Language · Computer Science 2023-09-28 Xuankai Chang , Brian Yan , Kwanghee Choi , Jeeweon Jung , Yichen Lu , Soumi Maiti , Roshan Sharma , Jiatong Shi , Jinchuan Tian , Shinji Watanabe , Yuya Fujita , Takashi Maekaku , Pengcheng Guo , Yao-Fei Cheng , Pavel Denisov , Kohei Saijo , Hsiu-Hsuan Wang