Related papers: Variable-rate discrete representation learning
An ability to model a generative process and learn a latent representation for speech in an unsupervised fashion will be crucial to process vast quantities of unlabelled speech data. Recently, deep probabilistic generative models such as…
We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content…
Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective…
Syntactic information contains structures and rules about how text sentences are arranged. Incorporating syntax into text modeling methods can potentially benefit both representation learning and generation. Variational autoencoders (VAEs)…
State-of-the-art Variational Auto-Encoders (VAEs) for learning disentangled latent representations give impressive results in discovering features like pitch, pause duration, and accent in speech data, leading to highly controllable…
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning. However, the sequential text generation common pitfall with VAEs is that the model tends to ignore latent variables with a strong auto-regressive…
In this thesis, we develop methods to enhance the interpretability of recent representation learning techniques in natural language processing (NLP) while accounting for the unavailability of annotated data. We choose to leverage…
Recent advancements in learning Discrete Representations as opposed to continuous ones have led to state of art results in tasks that involve Language, Audio and Vision. Some latent factors such as words, phonemes and shapes are better…
Advancement in speech technology has brought convenience to our life. However, the concern is on the rise as speech signal contains multiple personal attributes, which would lead to either sensitive information leakage or bias toward…
In this work we present an unsupervised approach to summarize sentences in abstractive way using Variational Autoencoder (VAE). VAE are known to learn a semantically rich latent variable, representing high dimensional input. VAEs are…
The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model,…
Neural conversation models such as encoder-decoder models are easy to generate bland and generic responses. Some researchers propose to use the conditional variational autoencoder(CVAE) which maximizes the lower bound on the conditional…
Variational Autoencoders (VAEs) are well-established as a principled approach to probabilistic unsupervised learning with neural networks. Typically, an encoder network defines the parameters of a Gaussian distributed latent space from…
Variational autoencoders (VAEs) have been used extensively to discover low-dimensional latent factors governing neural activity and animal behavior. However, without careful model selection, the uncovered latent factors may reflect noise in…
In recent years, speech emotion recognition (SER) has been used in wide ranging applications, from healthcare to the commercial sector. In addition to signal processing approaches, methods for SER now also use deep learning techniques which…
While sparse autoencoders (SAEs) successfully extract interpretable features from language models, applying them to audio generation faces unique challenges: audio's dense nature requires compression that obscures semantic meaning, and…
Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, the input data…
Neural latent variable models enable the discovery of interesting structure in speech audio data. This paper presents a comparison of two different approaches which are broadly based on predicting future time-steps or auto-encoding the…
While several self-supervised approaches for learning discrete speech representation have been proposed, it is unclear how these seemingly similar approaches relate to each other. In this paper, we consider a generative model with discrete…
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech features such as spectrograms are often used as the input for the…