Related papers: What Does the Speaker Embedding Encode?

Probing Deep Speaker Embeddings for Speaker-related Tasks

Deep speaker embeddings have shown promising results in speaker recognition, as well as in other speaker-related tasks. However, some issues are still under explored, for instance, the information encoded in these representations and their…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-15 Zifeng Zhao , Ding Pan , Junyi Peng , Rongzhi Gu

S-vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder

One of the most popular speaker embeddings is x-vectors, which are obtained from an architecture that gradually builds a larger temporal context with layers. In this paper, we propose to derive speaker embeddings from Transformer's encoder…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-14 N J Metilda Sagaya Mary , S Umesh , Sandesh V Katta

Probing the Information Encoded in X-vectors

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-16 Desh Raj , David Snyder , Daniel Povey , Sanjeev Khudanpur

Speaker Diarization with LSTM

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based audio…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-25 Quan Wang , Carlton Downey , Li Wan , Philip Andrew Mansfield , Ignacio Lopez Moreno

Interpreting the Dimensions of Speaker Embedding Space

Speaker embeddings are widely used in speaker verification systems and other applications where it is useful to characterise the voice of a speaker with a fixed-length vector. These embeddings tend to be treated as "black box" encodings,…

Sound · Computer Science 2025-10-21 Mark Huckvale

DGC-vector: A new speaker embedding for zero-shot voice conversion

Recently, more and more zero-shot voice conversion algorithms have been proposed. As a fundamental part of zero-shot voice conversion, speaker embeddings are the key to improving the converted speech's speaker similarity. In this paper, we…

Sound · Computer Science 2022-03-21 Ruitong Xiao , Haitong Zhang , Yue Lin

Attention Mechanism in Speaker Recognition: What Does It Learn in Deep Speaker Embedding?

This paper presents an experimental study on deep speaker embedding with an attention mechanism that has been found to be a powerful representation learning technique in speaker recognition. In this framework, an attention model works as a…

Sound · Computer Science 2018-09-26 Qiongqiong Wang , Koji Okabe , Kong Aik Lee , Hitoshi Yamamoto , Takafumi Koshinaka

On Feature Importance and Interpretability of Speaker Representations

Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-20 Frederik Rautenberg , Michael Kuhlmann , Jana Wiechmann , Fritz Seebauer , Petra Wagner , Reinhold Haeb-Umbach

Y-Vector: Multiscale Waveform Encoder for Speaker Embedding

State-of-the-art text-independent speaker verification systems typically use cepstral features or filter bank energies as speech features. Recent studies attempted to extract speaker embeddings directly from raw waveforms and have shown…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-10 Ge Zhu , Fei Jiang , Zhiyao Duan

Residual Information in Deep Speaker Embedding Architectures

Speaker embeddings represent a means to extract representative vectorial representations from a speech signal such that the representation pertains to the speaker identity alone. The embeddings are commonly used to classify and discriminate…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-07 Adriana Stan

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Woo Hyun Kang , Sung Hwan Mun , Min Hyun Han , Nam Soo Kim

DSARSR: Deep Stacked Auto-encoders Enhanced Robust Speaker Recognition

Speaker recognition is a biometric modality that utilizes the speaker's speech segments to recognize the identity, determining whether the test speaker belongs to one of the enrolled speakers. In order to improve the robustness of the…

Sound · Computer Science 2023-07-07 Zhifeng Wang , Chunyan Zeng , Surong Duan , Hongjie Ouyang , Hongmin Xu

Deep Speaker Vectors for Semi Text-independent Speaker Verification

Recent research shows that deep neural networks (DNNs) can be used to extract deep speaker vectors (d-vectors) that preserve speaker characteristics and can be used in speaker verification. This new method has been tested on text-dependent…

Computation and Language · Computer Science 2015-05-26 Lantian Li , Dong Wang , Zhiyong Zhang , Thomas Fang Zheng

Improving Embedding Extraction for Speaker Verification with Ladder Network

Speaker verification is an established yet challenging task in speech processing and a very vibrant research area. Recent speaker verification (SV) systems rely on deep neural networks to extract high-level embeddings which are able to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-23 Fei Tao , Gokhan Tur

Combination of Deep Speaker Embeddings for Diarisation

Significant progress has recently been made in speaker diarisation after the introduction of d-vectors as speaker embeddings extracted from neural network (NN) speaker classifiers for clustering speech segments. To extract better-performing…

Sound · Computer Science 2021-05-10 Guangzhi Sun , Chao Zhang , Phil Woodland

H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model

In this paper, a hierarchical attention network to generate utterance-level embeddings (H-vectors) for speaker identification is proposed. Since different parts of an utterance may have different contributions to speaker identities, the use…

Computation and Language · Computer Science 2019-10-22 Yanpei Shi , Qiang Huang , Thomas Hain

Speaker diarisation using 2D self-attentive combination of embeddings

Speaker diarisation systems often cluster audio segments using speaker embeddings such as i-vectors and d-vectors. Since different types of embeddings are often complementary, this paper proposes a generic framework to improve performance…

Computation and Language · Computer Science 2019-02-11 Guangzhi Sun , Chao Zhang , Phil Woodland

Neural i-vectors

Deep speaker embeddings have been demonstrated to outperform their generative counterparts, i-vectors, in recent speaker verification evaluations. To combine the benefits of high performance and generative interpretation, we investigate the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-21 Ville Vestman , Kong Aik Lee , Tomi H. Kinnunen

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

This paper proposes novel algorithms for speaker embedding using subjective inter-speaker similarity based on deep neural networks (DNNs). Although conventional DNN-based speaker embedding such as a $d$-vector can be applied to…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-22 Yuki Saito , Shinnosuke Takamichi , Hiroshi Saruwatari

Analyzing And Improving Neural Speaker Embeddings for ASR

Neural speaker embeddings encode the speaker's speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, few studies have investigated the usage of neural speaker embeddings for an ASR system. In…

Computation and Language · Computer Science 2023-09-21 Christoph Lüscher , Jingjing Xu , Mohammad Zeineldeen , Ralf Schlüter , Hermann Ney