Voice command generation using Progressive Wavegans

Thomas Wiest; Nicholas Cummins; Alice Baird; Simone Hantke; Judith Dineley; Björn Schuller

Voice command generation using Progressive Wavegans

Computation and Language 2019-03-19 v1 Machine Learning Sound Audio and Speech Processing Machine Learning

Authors: Thomas Wiest , Nicholas Cummins , Alice Baird , Simone Hantke , Judith Dineley , Björn Schuller

Abstract

Generative Adversarial Networks (GANs) have become exceedingly popular in a wide range of data-driven research fields, due in part to their success in image generation. Their ability to generate new samples, often from only a small amount of input data, makes them an exciting research tool in areas with limited data resources. One less-explored application of GANs is the synthesis of speech and audio samples. Herein, we propose a set of extensions to the WaveGAN paradigm, a recently proposed approach for sound generation using GANs. The aim of these extensions - preprocessing, Audio-to-Audio generation, skip connections and progressive structures - is to improve the human likeness of synthetic speech samples. Scores from listening tests with 30 volunteers demonstrated a moderate improvement (Cohen's d coefficient of 0.65) in human likeness using the proposed extensions compared to the original WaveGAN approach.

Keywords

generative adversarial network voice conversion audio generation

Cite

@article{arxiv.1903.07395,
  title  = {Voice command generation using Progressive Wavegans},
  author = {Thomas Wiest and Nicholas Cummins and Alice Baird and Simone Hantke and Judith Dineley and Björn Schuller},
  journal= {arXiv preprint arXiv:1903.07395},
  year   = {2019}
}

Comments

7 pages, 2 figures

Voice command generation using Progressive Wavegans

Abstract

Keywords

Cite

Comments

Related papers