Semi-supervised source localization with deep generative modeling

Michael J. Bianco; Sharon Gannot; Peter Gerstoft

doi:10.1109/MLSP49062.2020.9231825

Semi-supervised source localization with deep generative modeling

Audio and Speech Processing 2021-02-15 v3 Machine Learning Sound Signal Processing

Authors: Michael J. Bianco , Sharon Gannot , Peter Gerstoft

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We propose a semi-supervised localization approach based on deep generative modeling with variational autoencoders (VAEs). Localization in reverberant environments remains a challenge, which machine learning (ML) has shown promise in addressing. Even with large data volumes, the number of labels available for supervised learning in reverberant environments is usually small. We address this issue by performing semi-supervised learning (SSL) with convolutional VAEs. The VAE is trained to generate the phase of relative transfer functions (RTFs), in parallel with a DOA classifier, on both labeled and unlabeled RTF samples. The VAE-SSL approach is compared with SRP-PHAT and fully-supervised CNNs. We find that VAE-SSL can outperform both SRP-PHAT and CNN in label-limited scenarios.

Keywords

variational autoencoder self-supervised speech learning computer vision and image classification

Cite

@article{arxiv.2005.13163,
  title  = {Semi-supervised source localization with deep generative modeling},
  author = {Michael J. Bianco and Sharon Gannot and Peter Gerstoft},
  journal= {arXiv preprint arXiv:2005.13163},
  year   = {2021}
}

Comments

Published in proceedings of IEEE International Workshop on Machine Learning for Signal Processing. arXiv admin note: substantial text overlap with arXiv:2101.10636

Semi-supervised source localization with deep generative modeling

Abstract

Keywords

Cite

Comments

Related papers