English
Related papers

Related papers: Ordered and Binary Speaker Embedding

200 papers

The popular i-vector model represents speakers as low-dimensional continuous vectors (i-vectors), and hence it is a way of continuous speaker embedding. In this paper, we investigate binary speaker embedding, which transforms i-vectors to…

Sound · Computer Science 2016-04-01 Lantian Li , Dong Wang , Chao Xing , Kaimin Yu , Thomas Fang Zheng

Although deep neural networks are successful for many tasks in the speech domain, the high computational and memory costs of deep neural networks make it difficult to directly deploy highperformance Neural Network systems on low-resource…

Sound · Computer Science 2021-04-07 Tinglong Zhu , Xiaoyi Qin , Ming Li

In this paper, a hierarchical attention network to generate utterance-level embeddings (H-vectors) for speaker identification is proposed. Since different parts of an utterance may have different contributions to speaker identities, the use…

Computation and Language · Computer Science 2019-10-22 Yanpei Shi , Qiang Huang , Thomas Hain

The objective of this work is to train noise-robust speaker embeddings adapted for speaker diarisation. Speaker embeddings play a crucial role in the performance of diarisation systems, but they often capture spurious information such as…

Sound · Computer Science 2022-11-04 You Jin Kim , Hee-Soo Heo , Jee-weon Jung , Youngki Kwon , Bong-Jin Lee , Joon Son Chung

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Woo Hyun Kang , Sung Hwan Mun , Min Hyun Han , Nam Soo Kim

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the…

Speaker embeddings are widely used in speaker verification systems and other applications where it is useful to characterise the voice of a speaker with a fixed-length vector. These embeddings tend to be treated as "black box" encodings,…

Sound · Computer Science 2025-10-21 Mark Huckvale

Recently, hyperspherical embeddings have established themselves as a dominant technique for face and voice recognition. Specifically, Euclidean space vector embeddings are learned to encode person-specific information in their direction…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-25 Nikita Kuzmin , Igor Fedorov , Alexey Sholokhov

In multi-speaker applications is common to have pre-computed models from enrolled speakers. Using these models to identify the instances in which these speakers intervene in a recording is the task of speaker tracking. In this paper, we…

Developing a good speaker embedding has received tremendous interest in the speech community, with representations such as i-vector and d-vector demonstrating remarkable performance across various tasks. Despite their widespread adoption, a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-23 Shuai Wang , Yanmin Qian , Kai Yu

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

Neural speaker embeddings encode the speaker's speech characteristics through a DNN model and are prevalent for speaker verification tasks. However, few studies have investigated the usage of neural speaker embeddings for an ASR system. In…

Computation and Language · Computer Science 2023-09-21 Christoph Lüscher , Jingjing Xu , Mohammad Zeineldeen , Ralf Schlüter , Hermann Ney

Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of…

Computation and Language · Computer Science 2020-01-23 Julien Tissier , Christophe Gravier , Amaury Habrard

Embedding audio signal segments into vectors with fixed dimensionality is attractive because all following processing will be easier and more efficient, for example modeling, classifying or indexing. Audio Word2Vec previously proposed was…

Computation and Language · Computer Science 2018-11-08 Sung-Feng Huang , Yi-Chen Chen , Hung-yi Lee , Lin-shan Lee

Contrary to i-vectors, speaker embeddings such as x-vectors are incapable of leveraging unlabelled utterances, due to the classification loss over training speakers. In this paper, we explore an alternative training strategy to enable the…

Computer Vision and Pattern Recognition · Computer Science 2019-04-24 Themos Stafylakis , Johan Rohdin , Oldrich Plchot , Petr Mizera , Lukas Burget

Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-20 Frederik Rautenberg , Michael Kuhlmann , Jana Wiechmann , Fritz Seebauer , Petra Wagner , Reinhold Haeb-Umbach

We propose a new method for speaker diarization that can handle overlapping speech with 2+ people. Our method is based on compositional embeddings [1]: Like standard speaker embedding methods such as x-vector [2], compositional embedding…

Sound · Computer Science 2021-02-11 Zeqian Li , Jacob Whitehill

Many neural network speaker recognition systems model each speaker using a fixed-dimensional embedding vector. These embeddings are generally compared using either linear or 2nd-order scoring and, until recently, do not handle…

Computation and Language · Computer Science 2022-03-14 Jason Pelecanos , Quan Wang , Ignacio Lopez Moreno

In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted…

Sound · Computer Science 2017-09-18 Pawel Cyrta , Tomasz Trzciński , Wojciech Stokowiec

Speaker identification typically involves three stages. First, a front-end speaker embedding model is trained to embed utterance and speaker profiles. Second, a scoring function is applied between a runtime utterance and each speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-22 Zhenning Tan , Yuguang Yang , Eunjung Han , Andreas Stolcke
‹ Prev 1 2 3 10 Next ›