Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function

Qing Wang; Hang Chen; Ya Jiang; Zhe Wang; Yuyang Wang; Jun Du; Chin-Hui Lee

Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function

Audio and Speech Processing 2022-10-27 v1 Sound Image and Video Processing

Authors: Qing Wang , Hang Chen , Ya Jiang , Zhe Wang , Yuyang Wang , Jun Du , Chin-Hui Lee

Abstract

In this paper, we propose a deep learning based multi-speaker direction of arrival (DOA) estimation with audio and visual signals by using permutation-free loss function. We first collect a data set for multi-modal sound source localization (SSL) where both audio and visual signals are recorded in real-life home TV scenarios. Then we propose a novel spatial annotation method to produce the ground truth of DOA for each speaker with the video data by transformation between camera coordinate and pixel coordinate according to the pin-hole camera model. With spatial location information served as another input along with acoustic feature, multi-speaker DOA estimation could be solved as a classification task of active speaker detection. Label permutation problem in multi-speaker related tasks will be addressed since the locations of each speaker are used as input. Experiments conducted on both simulated data and real data show that the proposed audio-visual DOA estimation model outperforms audio-only DOA estimation model by a large margin.

Keywords

direction-of-arrival estimation audio signal processing speaker recognition and verification

Cite

@article{arxiv.2210.14581,
  title  = {Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function},
  author = {Qing Wang and Hang Chen and Ya Jiang and Zhe Wang and Yuyang Wang and Jun Du and Chin-Hui Lee},
  journal= {arXiv preprint arXiv:2210.14581},
  year   = {2022}
}

Comments

5 pages, 3 figures, accepted by ISCSLP 2022

Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function

Abstract

Keywords

Cite

Comments

Related papers