Related papers: Exploring Binary Classification Loss For Speaker V…

SphereFace2: Binary Classification is All You Need for Deep Face Recognition

State-of-the-art deep face recognition methods are mostly trained with a softmax-based multi-class classification framework. Despite being popular and effective, these methods still have a few shortcomings that limit empirical performance.…

Computer Vision and Pattern Recognition · Computer Science 2022-04-12 Yandong Wen , Weiyang Liu , Adrian Weller , Bhiksha Raj , Rita Singh

Binary Neural Network for Speaker Verification

Although deep neural networks are successful for many tasks in the speech domain, the high computational and memory costs of deep neural networks make it difficult to directly deploy highperformance Neural Network systems on low-resource…

Sound · Computer Science 2021-04-07 Tinglong Zhu , Xiaoyi Qin , Ming Li

Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. In this paper, a hierarchical attention network is proposed to solve a weakly labelled speaker identification problem. The use of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

A Speaker Verification Backend with Robust Performance across Conditions

In this paper, we address the problem of speaker verification in conditions unseen or unknown during development. A standard method for speaker verification consists of extracting speaker embeddings with a deep neural network and processing…

Sound · Computer Science 2021-08-18 Luciana Ferrer , Mitchell McLaren , Niko Brummer

On the Use of Self-Supervised Representation Learning for Speaker Diarization and Separation

Self-supervised speech models such as wav2vec2.0 and WavLM have been shown to significantly improve the performance of many downstream speech tasks, especially in low-resource settings, over the past few years. Despite this, evaluations on…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-18 Séverin Baroudi , Hervé Bredin , Joseph Razik , Ricard Marxer

Improving on-device speaker verification using federated learning with privacy

Information on speaker characteristics can be useful as side information in improving speaker recognition accuracy. However, such information is often private. This paper investigates how privacy-preserving learning can improve a speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-07 Filip Granqvist , Matt Seigel , Rogier van Dalen , Áine Cahill , Stephen Shum , Matthias Paulik

Generalized End-to-End Loss for Speaker Verification

In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function. Unlike TE2E, the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-10 Li Wan , Quan Wang , Alan Papir , Ignacio Lopez Moreno

Collar-aware Training for Streaming Speaker Change Detection in Broadcast Speech

In this paper, we present a novel training method for speaker change detection models. Speaker change detection is often viewed as a binary sequence labelling problem. The main challenges with this approach are the vagueness of annotated…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-17 Joonas Kalda , Tanel Alumäe

Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

Self-supervised learning approaches have lately achieved great success on a broad spectrum of machine learning problems. In the field of speech processing, one of the most successful recent self-supervised models is wav2vec 2.0. In this…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-10 Marie Kunešová , Zbyněk Zajíc

Disentangled representation learning for multilingual speaker recognition

The goal of this paper is to learn robust speaker representation for bilingual speaking scenario. The majority of the world's population speak at least two languages; however, most speaker recognition systems fail to recognise the same…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-08 Kihyun Nam , Youkyum Kim , Jaesung Huh , Hee Soo Heo , Jee-weon Jung , Joon Son Chung

A comparative study of several parameterizations for speaker recognition

This paper presents an exhaustive study about the robustness of several parameterizations, in speaker verification and identification tasks. We have studied several mismatch conditions: different recording sessions, microphones, and…

Sound · Computer Science 2022-03-02 Marcos Faundez-Zanuy

Neural Scoring: A Refreshed End-to-End Approach for Speaker Recognition in Complex Conditions

Modern speaker verification systems primarily rely on speaker embeddings, followed by verification based on cosine similarity between the embedding vectors of the enrollment and test utterances. While effective, these methods struggle with…

Sound · Computer Science 2025-07-04 Wan Lin , Junhui Chen , Tianhao Wang , Zhenyu Zhou , Lantian Li , Dong Wang

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

Overlapping speech diarization has been traditionally treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding multiple binary labels into a single label with…

Sound · Computer Science 2022-04-01 Zhihao Du , Shiliang Zhang , Siqi Zheng , Zhijie Yan

A Reinforcement Learning Framework for Online Speaker Diarization

Speaker diarization is a task to label an audio or video recording with the identity of the speaker at each given time stamp. In this work, we propose a novel machine learning framework to conduct real-time multi-speaker diarization and…

Sound · Computer Science 2023-02-23 Baihan Lin , Xinxin Zhang

Exploring wav2vec 2.0 on speaker verification and language identification

Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning. It follows a two-stage training process of pre-training and fine-tuning, and performs well in speech recognition tasks especially ultra-low…

Sound · Computer Science 2021-01-15 Zhiyun Fan , Meng Li , Shiyu Zhou , Bo Xu

Speaker Re-identification with Speaker Dependent Speech Enhancement

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

Robust Training for Speaker Verification against Noisy Labels

The deep learning models used for speaker verification rely heavily on large amounts of data and correct labeling. However, noisy (incorrect) labels often occur, which degrades the performance of the system. In this paper, we propose a…

Sound · Computer Science 2026-04-29 Zhihua Fang , Liang He , Hanhan Ma , Xiaochen Guo , Lin Li

What and When to Learn: CURriculum Ranking Loss for Large-Scale Speaker Verification

Speaker verification at large scale remains an open challenge as fixed-margin losses treat all samples equally regardless of quality. We hypothesize that mislabeled or degraded samples introduce noisy gradients that disrupt compact speaker…

Sound · Computer Science 2026-03-26 Massa Baali , Sarthak Bisht , Rita Singh , Bhiksha Raj

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-25 Théo Lepage , Réda Dehak

SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion

This paper proposes an improved approach for open-set speaker identification based on pretrained speaker foundation models. Building upon the previous Speaker Reciprocal Points Learning framework (V1), we first introduce an enhanced…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-16 Zhiyong Chen , Shuhang Wu , Yingjie Duan , Xinkang Xu , Xinhui Hu