Related papers: Deep Normalization for Speaker Vectors

Deep Speaker Vector Normalization with Maximum Gaussianality Training

Deep speaker embedding represents the state-of-the-art technique for speaker recognition. A key problem with this approach is that the resulting deep speaker vectors tend to be irregularly distributed. In previous research, we proposed a…

Sound · Computer Science 2020-11-02 Yunqi Cai , Lantian Li , Dong Wang , Andrew Abel

Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

A deep learning approach has been proposed recently to derive speaker identifies (d-vector) by a deep neural network (DNN). This approach has been applied to text-dependent speaker recognition tasks and shows reasonable performance gains…

Computation and Language · Computer Science 2015-06-30 Lantian Li , Yiye Lin , Zhiyong Zhang , Dong Wang

An iterative framework for self-supervised deep speaker representation learning

In this paper, we propose an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Danwei Cai , Weiqing Wang , Ming Li

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy…

Audio and Speech Processing · Electrical Eng. & Systems 2019-06-19 Xu Xiang , Shuai Wang , Houjun Huang , Yanmin Qian , Kai Yu

ECAPA-TDNN Embeddings for Speaker Diarization

Learning robust speaker embeddings is a crucial step in speaker diarization. Deep neural networks can accurately capture speaker discriminative characteristics and popular deep embeddings such as x-vectors are nowadays a fundamental…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-14 Nauman Dawalatabad , Mirco Ravanelli , François Grondin , Jenthe Thienpondt , Brecht Desplanques , Hwidong Na

Gaussian-Constrained training for speaker verification

Neural models, in particular the d-vector and x-vector architectures, have produced state-of-the-art performance on many speaker verification tasks. However, two potential problems of these neural models deserve more investigation. Firstly,…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-19 Lantian Li , Zhiyuan Tang , Ying Shi , Dong Wang

An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales

This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution…

Audio and Speech Processing · Electrical Eng. & Systems 2020-01-15 Bin Gu , Wu Guo

Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification

This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: (1) a hybrid neural network structure using both time delay neural network (TDNN) and long short-term memory neural…

Computation and Language · Computer Science 2019-02-22 Yun Tang , Guohong Ding , Jing Huang , Xiaodong He , Bowen Zhou

Parameterized Channel Normalization for Far-field Deep Speaker Verification

We address far-field speaker verification with deep neural network (DNN) based speaker embedding extractor, where mismatch between enrollment and test data often comes from convolutive effects (e.g. room reverberation) and noise. To…

Sound · Computer Science 2021-09-27 Xuechen Liu , Md Sahidullah , Tomi Kinnunen

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

This paper proposes novel algorithms for speaker embedding using subjective inter-speaker similarity based on deep neural networks (DNNs). Although conventional DNN-based speaker embedding such as a $d$-vector can be applied to…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-22 Yuki Saito , Shinnosuke Takamichi , Hiroshi Saruwatari

How to Improve Your Speaker Embeddings Extractor in Generic Toolkits

Recently, speaker embeddings extracted with deep neural networks became the state-of-the-art method for speaker verification. In this paper we aim to facilitate its implementation on a more generic toolkit than Kaldi, which we anticipate to…

Sound · Computer Science 2018-11-07 Hossein Zeinali , Lukas Burget , Johan Rohdin , Themos Stafylakis , Jan Cernocky

Deep Discriminant Analysis for i-vector Based Robust Speaker Recognition

Linear Discriminant Analysis (LDA) has been used as a standard post-processing procedure in many state-of-the-art speaker recognition tasks. Through maximizing the inter-speaker difference and minimizing the intra-speaker variation, LDA…

Sound · Computer Science 2018-05-04 Shuai Wang , Zili Huang , Yanmin Qian , Kai Yu

Deep Speaker: an End-to-End Neural Speaker Embedding System

We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity. The embeddings generated by Deep Speaker can be used for many tasks, including…

Computation and Language · Computer Science 2017-05-08 Chao Li , Xiaokong Ma , Bing Jiang , Xiangang Li , Xuewei Zhang , Xiao Liu , Ying Cao , Ajay Kannan , Zhenyao Zhu

DNN Speaker Tracking with Embeddings

In multi-speaker applications is common to have pre-computed models from enrolled speakers. Using these models to identify the instances in which these speakers intervene in a recording is the task of speaker tracking. In this paper, we…

Sound · Computer Science 2020-07-21 Carlos Rodrigo Castillo-Sanchez , Leibny Paola Garcia-Perera , Anabel Martin-Gonzalez

A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-31 Xuechen Liu , Md Sahidullah , Tomi Kinnunen

Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted…

Sound · Computer Science 2017-09-18 Pawel Cyrta , Tomasz Trzciński , Wojciech Stokowiec

Robust Speech Representation Learning via Flow-based Embedding Regularization

Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-08 Woo Hyun Kang , Jahangir Alam , Abderrahim Fathan

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Woo Hyun Kang , Sung Hwan Mun , Min Hyun Han , Nam Soo Kim

A Framework for Robust Speaker Verification in Highly Noisy Environments Leveraging Both Noisy and Enhanced Audio

Recent advancements in speaker verification techniques show promise, but their performance often deteriorates significantly in challenging acoustic environments. Although speech enhancement methods can improve perceived audio quality, they…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-27 Adam Katav , Yair Moshe , Israel Cohen

Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification

After their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification. However, as a feature extractor with long-term operations…

Sound · Computer Science 2021-09-27 Xuechen Liu , Md Sahidullah , Tomi Kinnunen