Related papers: Deep Speaker Vector Normalization with Maximum Gau…

Deep Normalization for Speaker Vectors

Deep speaker embedding has demonstrated state-of-the-art performance in speaker recognition tasks. However, one potential issue with this approach is that the speaker vectors derived from deep embedding models tend to be non-Gaussian for…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-03 Yunqi Cai , Lantian Li , Dong Wang , Andrew Abel

Gaussian-Constrained training for speaker verification

Neural models, in particular the d-vector and x-vector architectures, have produced state-of-the-art performance on many speaker verification tasks. However, two potential problems of these neural models deserve more investigation. Firstly,…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-19 Lantian Li , Zhiyuan Tang , Ying Shi , Dong Wang

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy…

Audio and Speech Processing · Electrical Eng. & Systems 2019-06-19 Xu Xiang , Shuai Wang , Houjun Huang , Yanmin Qian , Kai Yu

Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments

This research is an effort to present an effective approach to enhance text-independent speaker identification performance in emotional talking environments based on novel classifier called cascaded Gaussian Mixture Model-Deep Neural…

Sound · Computer Science 2018-10-12 Ismail Shahin , Ali Bou Nassif , Shibani Hamsa

SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-12-05 Martin Strauss , Nicola Pia , Nagashree K. S. Rao , Bernd Edler

Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

Multi-speaker speech synthesis is a technique for modeling multiple speakers' voices with a single model. Although many approaches using deep neural networks (DNNs) have been proposed, DNNs are prone to overfitting when the amount of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Kentaro Mitsui , Tomoki Koriyama , Hiroshi Saruwatari

An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales

This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-15 Bin Gu , Wu Guo

Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks

Recent studies have shown that deep neural networks (DNNs) perform significantly better than shallow networks and Gaussian mixture models (GMMs) on large vocabulary speech recognition tasks. In this paper, we argue that the improved…

Machine Learning · Computer Science 2018-12-06 Dong Yu , Michael L. Seltzer , Jinyu Li , Jui-Ting Huang , Frank Seide

An iterative framework for self-supervised deep speaker representation learning

In this paper, we propose an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Danwei Cai , Weiqing Wang , Ming Li

Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

In this paper we investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models. The adaptation of the DNN trained on GMMD features is done through the maximum a posteriori (MAP) adaptation of the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-17 Natalia Tomashenko , Yuri Khokhlov , Yannick Esteve

Speaker diarization with session-level speaker embedding refinement using graph neural networks

Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-26 Jixuan Wang , Xiong Xiao , Jian Wu , Ranjani Ramamurthy , Frank Rudzicz , Michael Brudno

Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification

The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification. This paper presents a multi-task learning architecture for training the speaker embedding DNN with the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-04-05 Lanhua You , Wu Guo , Lirong Dai , Jun Du

Gaussian speaker embedding learning for text-independent speaker verification

The x-vector maps segments of arbitrary duration to vectors of fixed dimension using deep neural network. Combined with the probabilistic linear discriminant analysis (PLDA) backend, the x-vector/PLDA has become the dominant framework in…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-15 Bin Gu , Wu Guo

Cross-lingual Speaker Verification with Deep Feature Learning

Existing speaker verification (SV) systems often suffer from performance degradation if there is any language mismatch between model training, speaker enrollment, and test. A major cause of this degradation is that most existing SV methods…

Sound · Computer Science 2017-06-27 Lantian Li , Dong Wang , Askar Rozi , Thomas Fang Zheng

Linearly Constrained Deep Beamformer for Multi-Speaker Scenarios

We propose a deep beamforming framework for enhancing target speaker(s) in multi-speaker environments. A deep neural network (DNN) is trained to estimate beamforming weights directly from noisy multichannel inputs while satisfying linear…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-21 Ilai Zaidel , Ori Engel , Bar Engel , Sharon Gannot

Normalized Features for Improving the Generalization of DNN Based Speech Enhancement

Enhancing noisy speech is an important task to restore its quality and to improve its intelligibility. In traditional non-machine-learning (ML) based approaches the parameters required for noise reduction are estimated blindly from the…

Sound · Computer Science 2018-01-16 Robert Rehr , Timo Gerkmann

Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

A deep learning approach has been proposed recently to derive speaker identifies (d-vector) by a deep neural network (DNN). This approach has been applied to text-dependent speaker recognition tasks and shows reasonable performance gains…

Computation and Language · Computer Science 2015-06-30 Lantian Li , Yiye Lin , Zhiyong Zhang , Dong Wang

A Novel Minimum Divergence Approach to Robust Speaker Identification

In this work, a novel solution to the speaker identification problem is proposed through minimization of statistical divergences between the probability distribution (g). of feature vectors from the test utterance and the probability…

Machine Learning · Statistics 2015-12-17 Ayanendranath Basu , Smarajit Bose , Amita Pal , Anish Mukherjee , Debasmita Das

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

This paper proposes novel algorithms for speaker embedding using subjective inter-speaker similarity based on deep neural networks (DNNs). Although conventional DNN-based speaker embedding such as a $d$-vector can be applied to…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-22 Yuki Saito , Shinnosuke Takamichi , Hiroshi Saruwatari

Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization

The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing. Many state-of-the-art language models are first trained on a large text corpus and then fine-tuned on…

Computation and Language · Computer Science 2023-11-13 Hang Hua , Xingjian Li , Dejing Dou , Cheng-Zhong Xu , Jiebo Luo