Related papers: Deep Speaker Vector Normalization with Maximum Gau…
Deep speaker embedding has demonstrated state-of-the-art performance in speaker recognition tasks. However, one potential issue with this approach is that the speaker vectors derived from deep embedding models tend to be non-Gaussian for…
Neural models, in particular the d-vector and x-vector architectures, have produced state-of-the-art performance on many speaker verification tasks. However, two potential problems of these neural models deserve more investigation. Firstly,…
Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy…
This research is an effort to present an effective approach to enhance text-independent speaker identification performance in emotional talking environments based on novel classifier called cascaded Gaussian Mixture Model-Deep Neural…
This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech…
Multi-speaker speech synthesis is a technique for modeling multiple speakers' voices with a single model. Although many approaches using deep neural networks (DNNs) have been proposed, DNNs are prone to overfitting when the amount of…
This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution…
Recent studies have shown that deep neural networks (DNNs) perform significantly better than shallow networks and Gaussian mixture models (GMMs) on large vocabulary speech recognition tasks. In this paper, we argue that the improved…
In this paper, we propose an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing…
In this paper we investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models. The adaptation of the DNN trained on GMMD features is done through the maximum a posteriori (MAP) adaptation of the…
Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be…
The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification. This paper presents a multi-task learning architecture for training the speaker embedding DNN with the…
The x-vector maps segments of arbitrary duration to vectors of fixed dimension using deep neural network. Combined with the probabilistic linear discriminant analysis (PLDA) backend, the x-vector/PLDA has become the dominant framework in…
Existing speaker verification (SV) systems often suffer from performance degradation if there is any language mismatch between model training, speaker enrollment, and test. A major cause of this degradation is that most existing SV methods…
We propose a deep beamforming framework for enhancing target speaker(s) in multi-speaker environments. A deep neural network (DNN) is trained to estimate beamforming weights directly from noisy multichannel inputs while satisfying linear…
Enhancing noisy speech is an important task to restore its quality and to improve its intelligibility. In traditional non-machine-learning (ML) based approaches the parameters required for noise reduction are estimated blindly from the…
A deep learning approach has been proposed recently to derive speaker identifies (d-vector) by a deep neural network (DNN). This approach has been applied to text-dependent speaker recognition tasks and shows reasonable performance gains…
In this work, a novel solution to the speaker identification problem is proposed through minimization of statistical divergences between the probability distribution (g). of feature vectors from the test utterance and the probability…
This paper proposes novel algorithms for speaker embedding using subjective inter-speaker similarity based on deep neural networks (DNNs). Although conventional DNN-based speaker embedding such as a $d$-vector can be applied to…
The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing. Many state-of-the-art language models are first trained on a large text corpus and then fine-tuned on…