Nonparallel Emotional Speech Conversion

Jian Gao; Deep Chakraborty; Hamidou Tembine; Olaitan Olaleye

doi:10.21437/Interspeech.2019-2878

Nonparallel Emotional Speech Conversion

Machine Learning 2020-04-14 v3 Audio and Speech Processing Machine Learning

Authors: Jian Gao , Deep Chakraborty , Hamidou Tembine , Olaitan Olaleye

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We propose a nonparallel data-driven emotional speech conversion method. It enables the transfer of emotion-related characteristics of a speech signal while preserving the speaker's identity and linguistic content. Most existing approaches require parallel data and time alignment, which is not available in most real applications. We achieve nonparallel training based on an unsupervised style transfer technique, which learns a translation model between two distributions instead of a deterministic one-to-one mapping between paired examples. The conversion model consists of an encoder and a decoder for each emotion domain. We assume that the speech signal can be decomposed into an emotion-invariant content code and an emotion-related style code in latent space. Emotion conversion is performed by extracting and recombining the content code of the source speech and the style code of the target emotion. We tested our method on a nonparallel corpora with four emotions. Both subjective and objective evaluations show the effectiveness of our approach.

Keywords

speech emotion recognition speech translation voice conversion

Cite

@article{arxiv.1811.01174,
  title  = {Nonparallel Emotional Speech Conversion},
  author = {Jian Gao and Deep Chakraborty and Hamidou Tembine and Olaitan Olaleye},
  journal= {arXiv preprint arXiv:1811.01174},
  year   = {2020}
}

Comments

Published in INTERSPEECH 2019, 5 pages, 6 figures. Simulation available at http://www.jian-gao.org/emogan

Nonparallel Emotional Speech Conversion

Abstract

Keywords

Cite

Comments

Related papers