Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

Sercan O. Arik; Heewoo Jun; Gregory Diamos

doi:10.1109/LSP.2018.2880284

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

Sound 2018-12-26 v2 Machine Learning Audio and Speech Processing

Authors: Sercan O. Arik , Heewoo Jun , Gregory Diamos

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN achieves more than an order of magnitude higher compute intensity than commonly-used iterative algorithms like Griffin-Lim, yielding efficient utilization for modern multi-core processors, and very fast (more than 300x real-time) waveform synthesis. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality. We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations.

Keywords

convolutional neural network speech recognition fingerprint recognition

Cite

@article{arxiv.1808.06719,
  title  = {Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks},
  author = {Sercan O. Arik and Heewoo Jun and Gregory Diamos},
  journal= {arXiv preprint arXiv:1808.06719},
  year   = {2018}
}

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

Abstract

Keywords

Cite

Related papers