English

VAE-based regularization for deep speaker embedding

Sound 2019-04-09 v1 Machine Learning Audio and Speech Processing

Abstract

Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called `x-vectors') are not Gaussian, causing performance degradation with the famous PLDA back-end scoring. In this paper, we propose a regularization approach based on Variational Auto-Encoder (VAE). This model transforms x-vectors to a latent space where mapped latent codes are more Gaussian, hence more suitable for PLDA scoring.

Keywords

Cite

@article{arxiv.1904.03617,
  title  = {VAE-based regularization for deep speaker embedding},
  author = {Yang Zhang and Lantian Li and Dong Wang},
  journal= {arXiv preprint arXiv:1904.03617},
  year   = {2019}
}
R2 v1 2026-06-23T08:31:55.809Z