Related papers: VAE-based Domain Adaptation for Speaker Verificati…

VAE-based regularization for deep speaker embedding

Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called `x-vectors') are not Gaussian, causing performance degradation with the famous PLDA back-end…

Sound · Computer Science 2019-04-09 Yang Zhang , Lantian Li , Dong Wang

Investigation of Using VAE for i-Vector Speaker Verification

New system for i-vector speaker recognition based on variational autoencoder (VAE) is investigated. VAE is a promising approach for developing accurate deep nonlinear generative models of complex data. Experiments show that VAE provides…

Sound · Computer Science 2017-05-26 Timur Pekhovsky , Maxim Korenevsky

A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model,…

Sound · Computer Science 2021-06-15 Xiaoyu Bie , Laurent Girin , Simon Leglaive , Thomas Hueber , Xavier Alameda-Pineda

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker,…

Computation and Language · Computer Science 2017-10-20 Xiaodong Cui , Vaibhava Goel , George Saon

Enhancing Zero-Shot Many to Many Voice Conversion with Self-Attention VAE

Variational auto-encoder (VAE) is an effective neural network architecture to disentangle a speech utterance into speaker identity and linguistic content latent embeddings, then generate an utterance for a target speaker from that of a…

Sound · Computer Science 2022-08-23 Ziang Long , Yunling Zheng , Meng Yu , Jack Xin

DEAAN: Disentangled Embedding and Adversarial Adaptation Network for Robust Speaker Representation Learning

Despite speaker verification has achieved significant performance improvement with the development of deep neural networks, domain mismatch is still a challenging problem in this field. In this study, we propose a novel framework to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-24 Mufan Sang , Wei Xia , John H. L. Hansen

Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion

An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies on this…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-09 Wen-Chin Huang , Hao Luo , Hsin-Te Hwang , Chen-Chou Lo , Yu-Huai Peng , Yu Tsao , Hsin-Min Wang

Generative Adversarial Speaker Embedding Networks for Domain Robust End-to-End Speaker Verification

This article presents a novel approach for learning domain-invariant speaker embeddings using Generative Adversarial Networks. The main idea is to confuse a domain discriminator so that is can't tell if embeddings are from the source or…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Gautam Bhattacharya , Joao Monteiro , Jahangir Alam , Patrick Kenny

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner. A previous study has…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-09 Wen-Chin Huang , Hsin-Te Hwang , Yu-Huai Peng , Yu Tsao , Hsin-Min Wang

Leveraging speaker attribute information using multi task learning for speaker verification and diarization

Deep speaker embeddings have become the leading method for encoding speaker identity in speaker recognition tasks. The embedding space should ideally capture the variations between all possible speakers, encoding the multiple acoustic…

Sound · Computer Science 2021-04-26 Chau Luu , Peter Bell , Steve Renals

Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification

Voice activity detection (VAD), which classifies frames as speech or non-speech, is an important module in many speech applications including speaker verification. In this paper, we propose a novel method, called self-adaptive soft VAD, to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-25 Youngmoon Jung , Yeunju Choi , Hoirin Kim

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion

Though significant progress has been made for the voice conversion (VC) of typical speech, VC for atypical speech, e.g., dysarthric and second-language (L2) speech, remains a challenge, since it involves correcting for atypical prosody…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-26 Disong Wang , Songxiang Liu , Lifa Sun , Xixin Wu , Xunying Liu , Helen Meng

An Adaptive X-vector Model for Text-independent Speaker Verification

In this paper, adaptive mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, adaptive convolutional neural networks (ACNNs) are employed in frame-level embedding…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-18 Bin Gu , Wu Guo , Lirong Dai , Jun Du

Conditional Deep Hierarchical Variational Autoencoder for Voice Conversion

Variational autoencoder-based voice conversion (VAE-VC) has the advantage of requiring only pairs of speeches and speaker labels for training. Unlike the majority of the research in VAE-VC which focuses on utilizing auxiliary losses or…

Sound · Computer Science 2021-12-07 Kei Akuzawa , Kotaro Onishi , Keisuke Takiguchi , Kohki Mametani , Koichiro Mori

Adapting End-to-End Neural Speaker Verification to New Languages and Recording Conditions with Adversarial Training

In this article we propose a novel approach for adapting speaker embeddings to new domains based on adversarial training of neural networks. We apply our embeddings to the task of text-independent speaker verification, a challenging,…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Gautam Bhattacharya , Jahangir Alam , Patrick Kenny

Online Speaker Adaptation for WaveNet-based Neural Vocoders

In this paper, we propose an online speaker adaptation method for WaveNet-based neural vocoders in order to improve their performance on speaker-independent waveform generation. In this method, a speaker encoder is first constructed using a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-17 Qiuchen Huang , Yang Ai , Zhenhua Ling

Siamese x-vector reconstruction for domain adapted speaker recognition

With the rise of voice-activated applications, the need for speaker recognition is rapidly increasing. The x-vector, an embedding approach based on a deep neural network (DNN), is considered the state-of-the-art when proper end-to-end…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-29 Shai Rozenberg , Hagai Aronowitz , Ron Hoory

Cross-lingual Text-independent Speaker Verification using Unsupervised Adversarial Discriminative Domain Adaptation

Speaker verification systems often degrade significantly when there is a language mismatch between training and testing data. Being able to improve cross-lingual speaker verification system using unlabeled data can greatly increase the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-03 Wei Xia , Jing Huang , John H. L. Hansen

Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics. However, this approach only uses clean speech in the training phase, making the estimation particularly sensitive to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-18 Huajian Fang , Guillaume Carbajal , Stefan Wermter , Timo Gerkmann

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over speech signals, which is then…

Sound · Computer Science 2020-12-18 Mostafa Sadeghi , Simon Leglaive , Xavier Alameda-PIneda , Laurent Girin , Radu Horaud