English
Related papers

Related papers: Self-supervised speaker embeddings

200 papers

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the…

In this paper, we refine and validate our method for training speaker embedding extractors using weak annotations. More specifically, we use only the audio stream of the source VoxCeleb videos and the names of the celebrities without…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-01 Sara Barahona , Ladislav Mošner , Themos Stafylakis , Oldřich Plchot , Junyi Peng , Lukáš Burget , Jan Černocký

In this paper, we demonstrate a method for training speaker embedding extractors using weak annotation. More specifically, we are using the full VoxCeleb recordings and the name of the celebrities appearing on each video without knowledge…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-10 Themos Stafylakis , Ladislav Mošner , Oldřich Plchot , Johan Rohdin , Anna Silnova , Lukáš Burget , Jan "Honza'' Černocký

In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent representations or embeddings containing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-05 Yoohwan Kwon , Soo-Whan Chung , Hong-Goo Kang

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks. In this paper, we use simple classifiers to investigate the contents encoded by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-16 Desh Raj , David Snyder , Daniel Povey , Sanjeev Khudanpur

Speaker identification in the household scenario (e.g., for smart speakers) is typically based on only a few enrollment utterances but a much larger set of unlabeled data, suggesting semisupervised learning to improve speaker profiles. We…

Sound · Computer Science 2022-02-22 Long Chen , Venkatesh Ravichandran , Andreas Stolcke

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Woo Hyun Kang , Sung Hwan Mun , Min Hyun Han , Nam Soo Kim

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets. However, enlarging dataset and models increases the computation and storage…

Audio and Speech Processing · Electrical Eng. & Systems 2018-07-24 Mahdi Hajibabaei , Dengxin Dai

The goal of this work is to train robust speaker recognition models without speaker labels. Recent works on unsupervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be…

Sound · Computer Science 2020-11-02 Jaesung Huh , Hee Soo Heo , Jingu Kang , Shinji Watanabe , Joon Son Chung

In this paper, a hierarchical attention network to generate utterance-level embeddings (H-vectors) for speaker identification is proposed. Since different parts of an utterance may have different contributions to speaker identities, the use…

Computation and Language · Computer Science 2019-10-22 Yanpei Shi , Qiang Huang , Thomas Hain

In this article we propose a novel approach for adapting speaker embeddings to new domains based on adversarial training of neural networks. We apply our embeddings to the task of text-independent speaker verification, a challenging,…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Gautam Bhattacharya , Jahangir Alam , Patrick Kenny

This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into…

Sound · Computer Science 2024-06-21 KiHyun Nam , Hee-Soo Heo , Jee-weon Jung , Joon Son Chung

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker,…

Computation and Language · Computer Science 2017-10-20 Xiaodong Cui , Vaibhava Goel , George Saon

Modeling the rich prosodic variations inherent in human speech is essential for generating natural-sounding speech. While speaker embeddings are commonly used as conditioning inputs in personalized speech generation, they are typically…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-22 Ismail Rasim Ulgen , John H. L. Hansen , Carlos Busso , Berrak Sisman

Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-24 Kwangje Baeg , Yeong-Gwan Kim , Young-Sub Han , Byoung-Ki Jeon

Speaker embeddings carry valuable emotion-related information, which makes them a promising resource for enhancing speech emotion recognition (SER), especially with limited labeled data. Traditionally, it has been assumed that emotion…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-03 Ismail Rasim Ulgen , Zongyang Du , Carlos Busso , Berrak Sisman

We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels. The speaker encoder is trained to extract a fixed-size speaker embedding from a spoken utterance…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-28 Ruijie Tao , Kong Aik Lee , Rohan Kumar Das , Ville Hautamäki , Haizhou Li

Learning a good speaker embedding is important for many automatic speaker recognition tasks, including verification, identification and diarization. The embeddings learned by softmax are not discriminative enough for open-set verification…

Machine Learning · Computer Science 2019-08-13 Zhiyong Chen , Zongze Ren , Shugong Xu

Neural speaker embeddings trained using classification objectives have demonstrated state-of-the-art performance in multiple applications. Typically, such embeddings are trained on an out-of-domain corpus on a single task e.g., speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-03 Manoj Kumar , Tae Jin-Park , Somer Bishop , Shrikanth Narayanan
‹ Prev 1 2 3 10 Next ›