CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization
Abstract
In recent years there have been many deep learning approaches towards the multi-speaker source separation problem. Most use Long Short-Term Memory - Recurrent Neural Networks (LSTM-RNN) or Convolutional Neural Networks (CNN) to model the sequential behavior of speech. In this paper we propose a novel network for source separation using an encoder-decoder CNN and LSTM in parallel. Hyper parameters have to be chosen for both parts of the network and they are potentially mutually dependent. Since hyper parameter grid search has a high computational burden, random search is often preferred. However, when sampling a new point in the hyper parameter space, it can potentially be very close to a previously evaluated point and thus give little additional information. Furthermore, random sampling is as likely to sample in a promising area as in an hyper space area dominated with poor performing models. Therefore, we use a Bayesian hyper parameter optimization technique and find that the parallel CNN-LSTM outperforms the LSTM-only and CNN-only model.
Cite
@article{arxiv.1912.09254,
title = {CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization},
author = {Jeroen Zegers and Hugo Van hamme},
journal= {arXiv preprint arXiv:1912.09254},
year = {2019}
}
Comments
Interspeech 2019