Related papers: Self-supervised Speaker Diarization

Graph-based Label Propagation for Semi-Supervised Speaker Identification

Speaker identification in the household scenario (e.g., for smart speakers) is typically based on only a few enrollment utterances but a much larger set of unlabeled data, suggesting semisupervised learning to improve speaker profiles. We…

Sound · Computer Science 2022-02-22 Long Chen , Venkatesh Ravichandran , Andreas Stolcke

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-25 Théo Lepage , Réda Dehak

Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling

Traditional speech separation and speaker diarization approaches rely on prior knowledge of target speakers or a predetermined number of participants in audio signals. To address these limitations, recent advances focus on developing…

Sound · Computer Science 2025-08-11 Md Asif Jalal , Luca Remaggi , Vasileios Moschopoulos , Thanasis Kotsiopoulos , Vandana Rajan , Karthikeyan Saravanan , Anastasis Drosou , Junho Heo , Hyuk Oh , Seokyeong Jeong

Semi-supervised multi-channel speaker diarization with cross-channel attention

Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios. This paper proposes a semi-supervised speaker diarization system to utilize large-scale…

Audio and Speech Processing · Electrical Eng. & Systems 2023-07-18 Shilong Wu , Jun Du , Maokui He , Shutong Niu , Hang Chen , Haitao Tang , Chin-Hui Lee

Pretraining Multi-Speaker Identification for Neural Speaker Diarization

End-to-end speaker diarization enables accurate overlap-aware diarization by jointly estimating multiple speakers' speech activities in parallel. This approach is data-hungry, requiring a large amount of labeled conversational data, which…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-02 Shota Horiguchi , Atsushi Ando , Marc Delcroix , Naohiro Tawara

Leveraging speaker attribute information using multi task learning for speaker verification and diarization

Deep speaker embeddings have become the leading method for encoding speaker identity in speaker recognition tasks. The embedding space should ideally capture the variations between all possible speakers, encoding the multiple acoustic…

Sound · Computer Science 2021-04-26 Chau Luu , Peter Bell , Steve Renals

Self-supervised Speaker Recognition Training Using Human-Machine Dialogues

Speaker recognition, recognizing speaker identities based on voice alone, enables important downstream applications, such as personalization and authentication. Learning speaker representations, in the context of supervised learning,…

Machine Learning · Computer Science 2022-07-13 Metehan Cekic , Ruirui Li , Zeya Chen , Yuguang Yang , Andreas Stolcke , Upamanyu Madhow

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

Overlapping speech diarization has been traditionally treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding multiple binary labels into a single label with…

Sound · Computer Science 2022-04-01 Zhihao Du , Shiliang Zhang , Siqi Zheng , Zhijie Yan

A Review of Speaker Diarization: Recent Advances with Deep Learning

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-29 Tae Jin Park , Naoyuki Kanda , Dimitrios Dimitriadis , Kyu J. Han , Shinji Watanabe , Shrikanth Narayanan

An iterative framework for self-supervised deep speaker representation learning

In this paper, we propose an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Danwei Cai , Weiqing Wang , Ming Li

Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization

In this paper, we present a semi-supervised training technique using pseudo-labeling for end-to-end neural diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-10 Yuki Takashima , Yusuke Fujita , Shota Horiguchi , Shinji Watanabe , Paola García , Kenji Nagamatsu

Self-Supervised Metric Learning With Graph Clustering For Speaker Diarization

In this paper, we propose a novel algorithm for speaker diarization using metric learning for graph based clustering. The graph clustering algorithms use an adjacency matrix consisting of similarity scores. These scores are computed between…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-15 Prachi Singh , Sriram Ganapathy

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

Overlapping speech diarization is always treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set. Specifically, we…

Sound · Computer Science 2021-11-30 Zhihao Du , Shiliang Zhang , Siqi Zheng , Weilong Huang , Ming Lei

Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders

Existing speaker diarization systems typically rely on large amounts of manually annotated data, which is labor-intensive and difficult to obtain, especially in real-world scenarios. Additionally, language-specific constraints in these…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-13 Phat Lam , Lam Pham , Truong Nguyen , Dat Ngo , Thinh Pham , Tin Nguyen , Loi Khanh Nguyen , Alexander Schindler

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully tuned conventional clustering-based methods on challenging datasets. However, the main drawback is that it cannot deal with the case where the number…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-24 Shota Horiguchi , Shinji Watanabe , Paola Garcia , Yawen Xue , Yuki Takashima , Yohei Kawaguchi

Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning

This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in a conversation without…

Sound · Computer Science 2024-12-02 Amit Kumar Bhuyan , Hrishikesh Dutta , Subir Biswas

Self-Supervised Speech Representation Learning: A Review

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and…

Computation and Language · Computer Science 2022-11-23 Abdelrahman Mohamed , Hung-yi Lee , Lasse Borgholt , Jakob D. Havtorn , Joakim Edin , Christian Igel , Katrin Kirchhoff , Shang-Wen Li , Karen Livescu , Lars Maaløe , Tara N. Sainath , Shinji Watanabe

Cluster-Guided Unsupervised Domain Adaptation for Deep Speaker Embedding

Recent studies have shown that pseudo labels can contribute to unsupervised domain adaptation (UDA) for speaker verification. Inspired by the self-training strategies that use an existing classifier to label the unlabeled data for…

Machine Learning · Computer Science 2023-06-21 Haiquan Mao , Feng Hong , Man-wai Mak

DNN Speaker Tracking with Embeddings

In multi-speaker applications is common to have pre-computed models from enrolled speakers. Using these models to identify the instances in which these speakers intervene in a recording is the task of speaker tracking. In this paper, we…

Sound · Computer Science 2020-07-21 Carlos Rodrigo Castillo-Sanchez , Leibny Paola Garcia-Perera , Anabel Martin-Gonzalez

A Reinforcement Learning Framework for Online Speaker Diarization

Speaker diarization is a task to label an audio or video recording with the identity of the speaker at each given time stamp. In this work, we propose a novel machine learning framework to conduct real-time multi-speaker diarization and…

Sound · Computer Science 2023-02-23 Baihan Lin , Xinxin Zhang