Variable-rate discrete representation learning

Sander Dieleman; Charlie Nash; Jesse Engel; Karen Simonyan

Variable-rate discrete representation learning

Machine Learning 2021-03-11 v1 Computation and Language Sound Audio and Speech Processing

Authors: Sander Dieleman , Charlie Nash , Jesse Engel , Karen Simonyan

Abstract

Semantically meaningful information content in perceptual signals is usually unevenly distributed. In speech signals for example, there are often many silences, and the speed of pronunciation can vary considerably. In this work, we propose slow autoencoders (SlowAEs) for unsupervised learning of high-level variable-rate discrete representations of sequences, and apply them to speech. We show that the resulting event-based representations automatically grow or shrink depending on the density of salient information in the input signals, while still allowing for faithful signal reconstruction. We develop run-length Transformers (RLTs) for event-based representation modelling and use them to construct language models in the speech domain, which are able to generate grammatical and semantically coherent utterances and continuations.

Keywords

variational autoencoder sparse autoencoders speech recognition

Cite

@article{arxiv.2103.06089,
  title  = {Variable-rate discrete representation learning},
  author = {Sander Dieleman and Charlie Nash and Jesse Engel and Karen Simonyan},
  journal= {arXiv preprint arXiv:2103.06089},
  year   = {2021}
}

Comments

26 pages, 15 figures, samples can be found at https://vdrl.github.io/

Variable-rate discrete representation learning

Abstract

Keywords

Cite

Comments

Related papers