Related papers: DropClass and DropAdapt: Dropping classes for deep…

Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Recently, speaker embeddings extracted from a speaker discriminative deep neural network (DNN) yield better performance than the conventional methods such as i-vector. In most cases, the DNN speaker classifier is trained using cross entropy…

Audio and Speech Processing · Electrical Eng. & Systems 2019-06-19 Xu Xiang , Shuai Wang , Houjun Huang , Yanmin Qian , Kai Yu

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-12 Jian Luo , Jianzong Wang , Ning Cheng , Jing Xiao

Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-24 Kwangje Baeg , Yeong-Gwan Kim , Young-Sub Han , Byoung-Ki Jeon

Robust Speech Representation Learning via Flow-based Embedding Regularization

Over the recent years, various deep learning-based methods were proposed for extracting a fixed-dimensional embedding vector from speech signals. Although the deep learning-based embedding extraction methods have shown good performance in…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-08 Woo Hyun Kang , Jahangir Alam , Abderrahim Fathan

Leveraging speaker attribute information using multi task learning for speaker verification and diarization

Deep speaker embeddings have become the leading method for encoding speaker identity in speaker recognition tasks. The embedding space should ideally capture the variations between all possible speakers, encoding the multiple acoustic…

Sound · Computer Science 2021-04-26 Chau Luu , Peter Bell , Steve Renals

Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries

In this paper, we demonstrate a method for training speaker embedding extractors using weak annotation. More specifically, we are using the full VoxCeleb recordings and the name of the celebrities appearing on each video without knowledge…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-10 Themos Stafylakis , Ladislav Mošner , Oldřich Plchot , Johan Rohdin , Anna Silnova , Lukáš Burget , Jan "Honza'' Černocký

Deep Representation Decomposition for Rate-Invariant Speaker Verification

While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability. Speaking rate mismatch is often observed in practical speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-31 Fuchuan Tong , Siqi Zheng , Haodong Zhou , Xingjia Xie , Qingyang Hong , Lin Li

A Study on Angular Based Embedding Learning for Text-independent Speaker Verification

Learning a good speaker embedding is important for many automatic speaker recognition tasks, including verification, identification and diarization. The embeddings learned by softmax are not discriminative enough for open-set verification…

Machine Learning · Computer Science 2019-08-13 Zhiyong Chen , Zongze Ren , Shugong Xu

Intra-class variation reduction of speaker representation in disentanglement framework

In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent representations or embeddings containing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-05 Yoohwan Kwon , Soo-Whan Chung , Hong-Goo Kang

Speaker Re-identification with Speaker Dependent Speech Enhancement

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-31 Xuechen Liu , Md Sahidullah , Tomi Kinnunen

Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

Speaker identification typically involves three stages. First, a front-end speaker embedding model is trained to embed utterance and speaker profiles. Second, a scoring function is applied between a runtime utterance and each speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-22 Zhenning Tan , Yuguang Yang , Eunjung Han , Andreas Stolcke

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-10 Woo Hyun Kang , Sung Hwan Mun , Min Hyun Han , Nam Soo Kim

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker,…

Computation and Language · Computer Science 2017-10-20 Xiaodong Cui , Vaibhava Goel , George Saon

Deep clustering: Discriminative embeddings for segmentation and separation

We address the problem of acoustic source separation in a deep learning framework we call "deep clustering." Rather than directly estimating signals or masking functions, we train a deep network to produce spectrogram embeddings that are…

Neural and Evolutionary Computing · Computer Science 2015-08-19 John R. Hershey , Zhuo Chen , Jonathan Le Roux , Shinji Watanabe

Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification

In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-28 Sung Hwan Mun , Min Hyun Han , Dongjune Lee , Jihwan Kim , Nam Soo Kim

Speaker recognition with two-step multi-modal deep cleansing

Neural network-based speaker recognition has achieved significant improvement in recent years. A robust speaker representation learns meaningful knowledge from both hard and easy samples in the training set to achieve good performance.…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-31 Ruijie Tao , Kong Aik Lee , Zhan Shi , Haizhou Li

Attention Mechanism in Speaker Recognition: What Does It Learn in Deep Speaker Embedding?

This paper presents an experimental study on deep speaker embedding with an attention mechanism that has been found to be a powerful representation learning technique in speaker recognition. In this framework, an attention model works as a…

Sound · Computer Science 2018-09-26 Qiongqiong Wang , Koji Okabe , Kong Aik Lee , Hitoshi Yamamoto , Takafumi Koshinaka

Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models

Dropout is a widely used regularization technique which improves the generalization ability of a model by randomly dropping neurons. In light of this, we propose Dropout Prompt Learning, which aims for applying dropout to improve the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Biao Chen , Lin Zuo , Mengmeng Jing , Kunbin He , Yuchen Wang

An iterative framework for self-supervised deep speaker representation learning

In this paper, we propose an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Danwei Cai , Weiqing Wang , Ming Li