English

Audio Spectrogram Representations for Processing with Convolutional Neural Networks

Sound 2017-06-30 v1 Machine Learning Multimedia Neural and Evolutionary Computing

Abstract

One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than it seems to be for visual images, and a variety of representations have been used for different applications including the raw digitized sample stream, hand-crafted features, machine discovered features, MFCCs and variants that include deltas, and a variety of spectral representations. This paper reviews some of these representations and issues that arise, focusing particularly on spectrograms for generating audio using neural networks for style transfer.

Keywords

Cite

@article{arxiv.1706.09559,
  title  = {Audio Spectrogram Representations for Processing with Convolutional Neural Networks},
  author = {L. Wyse},
  journal= {arXiv preprint arXiv:1706.09559},
  year   = {2017}
}

Comments

Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE])

R2 v1 2026-06-22T20:32:53.544Z