Related papers: Ordered and Binary Speaker Embedding

Binary Speaker Embedding

The popular i-vector model represents speakers as low-dimensional continuous vectors (i-vectors), and hence it is a way of continuous speaker embedding. In this paper, we investigate binary speaker embedding, which transforms i-vectors to…

Sound · Computer Science 2016-04-01 Lantian Li , Dong Wang , Chao Xing , Kaimin Yu , Thomas Fang Zheng

Binary Neural Network for Speaker Verification

Although deep neural networks are successful for many tasks in the speech domain, the high computational and memory costs of deep neural networks make it difficult to directly deploy highperformance Neural Network systems on low-resource…

Sound · Computer Science 2021-04-07 Tinglong Zhu , Xiaoyi Qin , Ming Li

H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model

In this paper, a hierarchical attention network to generate utterance-level embeddings (H-vectors) for speaker identification is proposed. Since different parts of an utterance may have different contributions to speaker identities, the use…

Computation and Language · Computer Science 2019-10-22 Yanpei Shi , Qiang Huang , Thomas Hain

Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity

The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation. Speaker embeddings play a crucial role in the performance of diarisation systems, but they often capture spurious information such as…

Sound · Computer Science 2022-11-04 You Jin Kim , Hee-Soo Heo , Jee-weon Jung , Youngki Kwon , Bong-Jin Lee , Joon Son Chung

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Woo Hyun Kang , Sung Hwan Mun , Min Hyun Han , Nam Soo Kim

U-vectors: Generating clusterable speaker embedding from unlabeled data

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the…

Sound · Computer Science 2021-10-25 M. F. Mridha , Abu Quwsar Ohi , Muhammad Mostafa Monowar , Md. Abdul Hamid , Md. Rashedul Islam , Yutaka Watanobe

Interpreting the Dimensions of Speaker Embedding Space

Speaker embeddings are widely used in speaker verification systems and other applications where it is useful to characterise the voice of a speaker with a fixed-length vector. These embeddings tend to be treated as "black box" encodings,…

Sound · Computer Science 2025-10-21 Mark Huckvale

Magnitude-aware Probabilistic Speaker Embeddings

Recently, hyperspherical embeddings have established themselves as a dominant technique for face and voice recognition. Specifically, Euclidean space vector embeddings are learned to encode person-specific information in their direction…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-25 Nikita Kuzmin , Igor Fedorov , Alexey Sholokhov

DNN Speaker Tracking with Embeddings

In multi-speaker applications is common to have pre-computed models from enrolled speakers. Using these models to identify the instances in which these speakers intervene in a recording is the task of speaker tracking. In this paper, we…

Sound · Computer Science 2020-07-21 Carlos Rodrigo Castillo-Sanchez , Leibny Paola Garcia-Perera , Anabel Martin-Gonzalez

What Does the Speaker Embedding Encode?

Developing a good speaker embedding has received tremendous interest in the speech community, with representations such as i-vector and d-vector demonstrating remarkable performance across various tasks. Despite their widespread adoption, a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-23 Shuai Wang , Yanmin Qian , Kai Yu

Speaker Re-identification with Speaker Dependent Speech Enhancement

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

Analyzing And Improving Neural Speaker Embeddings for ASR

Neural speaker embeddings encode the speaker's speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, few studies have investigated the usage of neural speaker embeddings for an ASR system. In…

Computation and Language · Computer Science 2023-09-21 Christoph Lüscher , Jingjing Xu , Mohammad Zeineldeen , Ralf Schlüter , Hermann Ney

Near-lossless Binarization of Word Embeddings

Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of…

Computation and Language · Computer Science 2020-01-23 Julien Tissier , Christophe Gravier , Amaury Habrard

Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection

Embedding audio signal segments into vectors with fixed dimensionality is attractive because all following processing will be easier and more efficient, for example modeling, classifying or indexing. Audio Word2Vec previously proposed was…

Computation and Language · Computer Science 2018-11-08 Sung-Feng Huang , Yi-Chen Chen , Hung-yi Lee , Lin-shan Lee

Self-supervised speaker embeddings

Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging unlabelled utterances, due to the classification loss over training speakers. In this paper, we explore an alternative training strategy to enable the…

Computer Vision and Pattern Recognition · Computer Science 2019-04-24 Themos Stafylakis , Johan Rohdin , Oldrich Plchot , Petr Mizera , Lukas Burget

On Feature Importance and Interpretability of Speaker Representations

Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-20 Frederik Rautenberg , Michael Kuhlmann , Jana Wiechmann , Fritz Seebauer , Petra Wagner , Reinhold Haeb-Umbach

Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers

We propose a new method for speaker diarization that can handle overlapping speech with 2+ people. Our method is based on compositional embeddings [1]: Like standard speaker embedding methods such as x-vector [2], compositional embedding…

Sound · Computer Science 2021-02-11 Zeqian Li , Jacob Whitehill

Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition

Many neural network speaker recognition systems model each speaker using a fixed-dimensional embedding vector. These embeddings are generally compared using either linear or 2nd-order scoring and, until recently, do not handle…

Computation and Language · Computer Science 2022-03-14 Jason Pelecanos , Quan Wang , Ignacio Lopez Moreno

Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted…

Sound · Computer Science 2017-09-18 Pawel Cyrta , Tomasz Trzciński , Wojciech Stokowiec

Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

Speaker identification typically involves three stages. First, a front-end speaker embedding model is trained to embed utterance and speaker profiles. Second, a scoring function is applied between a runtime utterance and each speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-22 Zhenning Tan , Yuguang Yang , Eunjung Han , Andreas Stolcke