Related papers: Self-supervised speaker embeddings

U-vectors: Generating clusterable speaker embedding from unlabeled data

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the…

Sound · Computer Science 2021-10-25 M. F. Mridha , Abu Quwsar Ohi , Muhammad Mostafa Monowar , Md. Abdul Hamid , Md. Rashedul Islam , Yutaka Watanobe

State-of-the-art Embeddings with Video-free Segmentation of the Source VoxCeleb Data

In this paper, we refine and validate our method for training speaker embedding extractors using weak annotations. More specifically, we use only the audio stream of the source VoxCeleb videos and the names of the celebrities without…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-01 Sara Barahona , Ladislav Mošner , Themos Stafylakis , Oldřich Plchot , Junyi Peng , Lukáš Burget , Jan Černocký

Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries

In this paper, we demonstrate a method for training speaker embedding extractors using weak annotation. More specifically, we are using the full VoxCeleb recordings and the name of the celebrities appearing on each video without knowledge…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-10 Themos Stafylakis , Ladislav Mošner , Oldřich Plchot , Johan Rohdin , Anna Silnova , Lukáš Burget , Jan "Honza'' Černocký

Intra-class variation reduction of speaker representation in disentanglement framework

In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent representations or embeddings containing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-05 Yoohwan Kwon , Soo-Whan Chung , Hong-Goo Kang

Probing the Information Encoded in X-vectors

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-16 Desh Raj , David Snyder , Daniel Povey , Sanjeev Khudanpur

Graph-based Label Propagation for Semi-Supervised Speaker Identification

Speaker identification in the household scenario (e.g., for smart speakers) is typically based on only a few enrollment utterances but a much larger set of unlabeled data, suggesting semisupervised learning to improve speaker profiles. We…

Sound · Computer Science 2022-02-22 Long Chen , Venkatesh Ravichandran , Andreas Stolcke

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Woo Hyun Kang , Sung Hwan Mun , Min Hyun Han , Nam Soo Kim

Speaker Re-identification with Speaker Dependent Speech Enhancement

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

Unified Hypersphere Embedding for Speaker Recognition

Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage…

Audio and Speech Processing · Electrical Eng. & Systems 2018-07-24 Mahdi Hajibabaei , Dengxin Dai

Augmentation adversarial training for self-supervised speaker recognition

The goal of this work is to train robust speaker recognition models without speaker labels. Recent works on unsupervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be…

Sound · Computer Science 2020-11-02 Jaesung Huh , Hee Soo Heo , Jingu Kang , Shinji Watanabe , Joon Son Chung

H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model

In this paper, a hierarchical attention network to generate utterance-level embeddings (H-vectors) for speaker identification is proposed. Since different parts of an utterance may have different contributions to speaker identities, the use…

Computation and Language · Computer Science 2019-10-22 Yanpei Shi , Qiang Huang , Thomas Hain

Adapting End-to-End Neural Speaker Verification to New Languages and Recording Conditions with Adversarial Training

In this article we propose a novel approach for adapting speaker embeddings to new domains based on adversarial training of neural networks. We apply our embeddings to the task of text-independent speaker verification, a challenging,…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Gautam Bhattacharya , Jahangir Alam , Patrick Kenny

Disentangled Representation Learning for Environment-agnostic Speaker Recognition

This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into…

Sound · Computer Science 2024-06-21 KiHyun Nam , Hee-Soo Heo , Jee-weon Jung , Joon Son Chung

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker,…

Computation and Language · Computer Science 2017-10-20 Xiaodong Cui , Vaibhava Goel , George Saon

Rethinking Speaker Embeddings for Speech Generation: Sub-Center Modeling for Capturing Intra-Speaker Diversity

Modeling the rich prosodic variations inherent in human speech is essential for generating natural-sounding speech. While speaker embeddings are commonly used as conditioning inputs in personalized speech generation, they are typically…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-22 Ismail Rasim Ulgen , John H. L. Hansen , Carlos Busso , Berrak Sisman

Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-24 Kwangje Baeg , Yeong-Gwan Kim , Young-Sub Han , Byoung-Ki Jeon

Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition

Speaker embeddings carry valuable emotion-related information, which makes them a promising resource for enhancing speech emotion recognition (SER), especially with limited labeled data. Traditionally, it has been assumed that emotion…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-03 Ismail Rasim Ulgen , Zongyang Du , Carlos Busso , Berrak Sisman

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels. The speaker encoder is trained to extract a fixed-size speaker embedding from a spoken utterance…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-28 Ruijie Tao , Kong Aik Lee , Rohan Kumar Das , Ville Hautamäki , Haizhou Li

A Study on Angular Based Embedding Learning for Text-independent Speaker Verification

Learning a good speaker embedding is important for many automatic speaker recognition tasks, including verification, identification and diarization. The embeddings learned by softmax are not discriminative enough for open-set verification…

Machine Learning · Computer Science 2019-08-13 Zhiyong Chen , Zongze Ren , Shugong Xu

Designing Neural Speaker Embeddings with Meta Learning

Neural speaker embeddings trained using classification objectives have demonstrated state-of-the-art performance in multiple applications. Typically, such embeddings are trained on an out-of-domain corpus on a single task e.g., speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-03 Manoj Kumar , Tae Jin-Park , Somer Bishop , Shrikanth Narayanan