English
Related papers

Related papers: VAE-based Domain Adaptation for Speaker Verificati…

200 papers

Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called `x-vectors') are not Gaussian, causing performance degradation with the famous PLDA back-end…

Sound · Computer Science 2019-04-09 Yang Zhang , Lantian Li , Dong Wang

New system for i-vector speaker recognition based on variational autoencoder (VAE) is investigated. VAE is a promising approach for developing accurate deep nonlinear generative models of complex data. Experiments show that VAE provides…

Sound · Computer Science 2017-05-26 Timur Pekhovsky , Maxim Korenevsky

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model,…

Sound · Computer Science 2021-06-15 Xiaoyu Bie , Laurent Girin , Simon Leglaive , Thomas Hueber , Xavier Alameda-Pineda

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker,…

Computation and Language · Computer Science 2017-10-20 Xiaodong Cui , Vaibhava Goel , George Saon

Variational auto-encoder (VAE) is an effective neural network architecture to disentangle a speech utterance into speaker identity and linguistic content latent embeddings, then generate an utterance for a target speaker from that of a…

Sound · Computer Science 2022-08-23 Ziang Long , Yunling Zheng , Meng Yu , Jack Xin

Despite speaker verification has achieved significant performance improvement with the development of deep neural networks, domain mismatch is still a challenging problem in this field. In this study, we propose a novel framework to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-24 Mufan Sang , Wei Xia , John H. L. Hansen

An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies on this…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-09 Wen-Chin Huang , Hao Luo , Hsin-Te Hwang , Chen-Chou Lo , Yu-Huai Peng , Yu Tsao , Hsin-Min Wang

This article presents a novel approach for learning domain-invariant speaker embeddings using Generative Adversarial Networks. The main idea is to confuse a domain discriminator so that is can't tell if embeddings are from the source or…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Gautam Bhattacharya , Joao Monteiro , Jahangir Alam , Patrick Kenny

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner. A previous study has…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-09 Wen-Chin Huang , Hsin-Te Hwang , Yu-Huai Peng , Yu Tsao , Hsin-Min Wang

Deep speaker embeddings have become the leading method for encoding speaker identity in speaker recognition tasks. The embedding space should ideally capture the variations between all possible speakers, encoding the multiple acoustic…

Sound · Computer Science 2021-04-26 Chau Luu , Peter Bell , Steve Renals

Voice activity detection (VAD), which classifies frames as speech or non-speech, is an important module in many speech applications including speaker verification. In this paper, we propose a novel method, called self-adaptive soft VAD, to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-25 Youngmoon Jung , Yeunju Choi , Hoirin Kim

Though significant progress has been made for the voice conversion (VC) of typical speech, VC for atypical speech, e.g., dysarthric and second-language (L2) speech, remains a challenge, since it involves correcting for atypical prosody…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-26 Disong Wang , Songxiang Liu , Lifa Sun , Xixin Wu , Xunying Liu , Helen Meng

In this paper, adaptive mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, adaptive convolutional neural networks (ACNNs) are employed in frame-level embedding…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-18 Bin Gu , Wu Guo , Lirong Dai , Jun Du

Variational autoencoder-based voice conversion (VAE-VC) has the advantage of requiring only pairs of speeches and speaker labels for training. Unlike the majority of the research in VAE-VC which focuses on utilizing auxiliary losses or…

Sound · Computer Science 2021-12-07 Kei Akuzawa , Kotaro Onishi , Keisuke Takiguchi , Kohki Mametani , Koichiro Mori

In this article we propose a novel approach for adapting speaker embeddings to new domains based on adversarial training of neural networks. We apply our embeddings to the task of text-independent speaker verification, a challenging,…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Gautam Bhattacharya , Jahangir Alam , Patrick Kenny

In this paper, we propose an online speaker adaptation method for WaveNet-based neural vocoders in order to improve their performance on speaker-independent waveform generation. In this method, a speaker encoder is first constructed using a…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-17 Qiuchen Huang , Yang Ai , Zhenhua Ling

With the rise of voice-activated applications, the need for speaker recognition is rapidly increasing. The x-vector, an embedding approach based on a deep neural network (DNN), is considered the state-of-the-art when proper end-to-end…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-29 Shai Rozenberg , Hagai Aronowitz , Ron Hoory

Speaker verification systems often degrade significantly when there is a language mismatch between training and testing data. Being able to improve cross-lingual speaker verification system using unlabeled data can greatly increase the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-03 Wei Xia , Jing Huang , John H. L. Hansen

Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics. However, this approach only uses clean speech in the training phase, making the estimation particularly sensitive to…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-18 Huajian Fang , Guillaume Carbajal , Stefan Wermter , Timo Gerkmann

Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data. VAEs have been successfully used to learn a probabilistic prior over speech signals, which is then…

Sound · Computer Science 2020-12-18 Mostafa Sadeghi , Simon Leglaive , Xavier Alameda-PIneda , Laurent Girin , Radu Horaud
‹ Prev 1 2 3 10 Next ›