Related papers: An iterative framework for self-supervised deep sp…

The DKU-DukeECE System for the Self-Supervision Speaker Verification Task of the 2021 VoxCeleb Speaker Recognition Challenge

This report describes the submission of the DKU-DukeECE team to the self-supervision speaker verification task of the 2021 VoxCeleb Speaker Recognition Challenge (VoxSRC). Our method employs an iterative labeling framework to learn…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-08 Danwei Cai , Ming Li

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-28 Yafeng Chen , Siqi Zheng , Hui Wang , Luyao Cheng , Qian Chen , Shiliang Zhang , Wen Wang

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persistent challenge. In this paper, we propose a novel self-supervised speaker verification approach, Self-Distillation…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-30 Yafeng Chen , Siqi Zheng , Hui Wang , Luyao Cheng , Qian Chen , Chong Deng , Shiliang Zhang , Wen Wang

Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification

The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification. This paper presents a multi-task learning architecture for training the speaker embedding DNN with the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-04-05 Lanhua You , Wu Guo , Lirong Dai , Jun Du

Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification

This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: (1) a hybrid neural network structure using both time delay neural network (TDNN) and long short-term memory neural…

Computation and Language · Computer Science 2019-02-22 Yun Tang , Guohong Ding , Jing Huang , Xiaodong He , Bowen Zhou

Improved Deep Speaker Feature Learning for Text-Dependent Speaker Recognition

A deep learning approach has been proposed recently to derive speaker identifies (d-vector) by a deep neural network (DNN). This approach has been applied to text-dependent speaker recognition tasks and shows reasonable performance gains…

Computation and Language · Computer Science 2015-06-30 Lantian Li , Yiye Lin , Zhiyong Zhang , Dong Wang

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

This paper proposes novel algorithms for speaker embedding using subjective inter-speaker similarity based on deep neural networks (DNNs). Although conventional DNN-based speaker embedding such as a $d$-vector can be applied to…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-22 Yuki Saito , Shinnosuke Takamichi , Hiroshi Saruwatari

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy…

Audio and Speech Processing · Electrical Eng. & Systems 2019-06-19 Xu Xiang , Shuai Wang , Houjun Huang , Yanmin Qian , Kai Yu

Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning

Speaker representation learning is crucial for voice recognition systems, with recent advances in self-supervised approaches reducing dependency on labeled data. Current two-stage iterative frameworks, while effective, suffer from…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-03 Danwei Cai , Zexin Cai , Ze Li , Ming Li

Self-supervised Speaker Diarization

Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker…

Sound · Computer Science 2022-10-07 Yehoshua Dissen , Felix Kreuk , Joseph Keshet

Towards Structured Deep Neural Network for Automatic Speech Recognition

In this paper we propose the Structured Deep Neural Network (structured DNN) as a structured and deep learning framework. This approach can learn to find the best structured object (such as a label sequence) given a structured input (such…

Computation and Language · Computer Science 2015-11-10 Yi-Hsiu Liao , Hung-yi Lee , Lin-shan Lee

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-25 Théo Lepage , Réda Dehak

Deep Speaker: an End-to-End Neural Speaker Embedding System

We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity. The embeddings generated by Deep Speaker can be used for many tasks, including…

Computation and Language · Computer Science 2017-05-08 Chao Li , Xiaokong Ma , Bing Jiang , Xiangang Li , Xuewei Zhang , Xiao Liu , Ying Cao , Ajay Kannan , Zhenyao Zhu

Towards evolution of Deep Neural Networks through contrastive Self-Supervised learning

Deep Neural Networks (DNNs) have been successfully applied to a wide range of problems. However, two main limitations are commonly pointed out. The first one is that they require long time to design. The other is that they heavily rely on…

Neural and Evolutionary Computing · Computer Science 2024-06-21 Adriano Vinhas , João Correia , Penousal Machado

Deep Normalization for Speaker Vectors

Deep speaker embedding has demonstrated state-of-the-art performance in speaker recognition tasks. However, one potential issue with this approach is that the speaker vectors derived from deep embedding models tend to be non-Gaussian for…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-03 Yunqi Cai , Lantian Li , Dong Wang , Andrew Abel

Augmentation adversarial training for self-supervised speaker recognition

The goal of this work is to train robust speaker recognition models without speaker labels. Recent works on unsupervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be…

Sound · Computer Science 2020-11-02 Jaesung Huh , Hee Soo Heo , Jingu Kang , Shinji Watanabe , Joon Son Chung

Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted…

Sound · Computer Science 2017-09-18 Pawel Cyrta , Tomasz Trzciński , Wojciech Stokowiec

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker,…

Computation and Language · Computer Science 2017-10-20 Xiaodong Cui , Vaibhava Goel , George Saon

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels

Iterative self-training, or iterative pseudo-labeling (IPL) -- using an improved model from the current iteration to provide pseudo-labels for the next iteration -- has proven to be a powerful approach to enhance the quality of speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-22 Zakaria Aldeneh , Takuya Higuchi , Jee-weon Jung , Li-Wei Chen , Stephen Shum , Ahmed Hussen Abdelaziz , Shinji Watanabe , Tatiana Likhomanenko , Barry-John Theobald

Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation

The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-05 Fabian-Robert Stöter , Soumitro Chakrabarty , Bernd Edler , Emanuël A. P. Habets