CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization

Jeroen Zegers; Hugo Van hamme

CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization

Machine Learning 2019-12-20 v1 Audio and Speech Processing Machine Learning

Authors: Jeroen Zegers , Hugo Van hamme

Abstract

In recent years there have been many deep learning approaches towards the multi-speaker source separation problem. Most use Long Short-Term Memory - Recurrent Neural Networks (LSTM-RNN) or Convolutional Neural Networks (CNN) to model the sequential behavior of speech. In this paper we propose a novel network for source separation using an encoder-decoder CNN and LSTM in parallel. Hyper parameters have to be chosen for both parts of the network and they are potentially mutually dependent. Since hyper parameter grid search has a high computational burden, random search is often preferred. However, when sampling a new point in the hyper parameter space, it can potentially be very close to a previously evaluated point and thus give little additional information. Furthermore, random sampling is as likely to sample in a promising area as in an hyper space area dominated with poor performing models. Therefore, we use a Bayesian hyper parameter optimization technique and find that the parallel CNN-LSTM outperforms the LSTM-only and CNN-only model.

Keywords

speech recognition long short-term memory natural language parsing

Cite

@article{arxiv.1912.09254,
  title  = {CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization},
  author = {Jeroen Zegers and Hugo Van hamme},
  journal= {arXiv preprint arXiv:1912.09254},
  year   = {2019}
}

Comments

Interspeech 2019

CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization

Abstract

Keywords

Cite

Comments

Related papers