Related papers: Constrained speaker linking

Speaker attribution with voice profiles by graph-based semi-supervised learning

Speaker attribution is required in many real-world applications, such as meeting transcription, where speaker identity is assigned to each utterance according to speaker voice profiles. In this paper, we propose to solve the speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Jixuan Wang , Xiong Xiao , Jian Wu , Ranjani Ramamurthy , Frank Rudzicz , Michael Brudno

Distributed speech separation in spatially unconstrained microphone arrays

Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different…

Signal Processing · Electrical Eng. & Systems 2021-02-09 Nicolas Furnon , Romain Serizel , Irina Illina , Slim Essid

Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. In this paper, a hierarchical attention network is proposed to solve a weakly labelled speaker identification problem. The use of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

U-vectors: Generating clusterable speaker embedding from unlabeled data

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the…

Sound · Computer Science 2021-10-25 M. F. Mridha , Abu Quwsar Ohi , Muhammad Mostafa Monowar , Md. Abdul Hamid , Md. Rashedul Islam , Yutaka Watanobe

Usable Speech Assignment for Speaker Identification under Co-Channel Situation

Usable speech criteria are proposed to extract minimally corrupted speech for speaker identification (SID) in co-channel speech. In co-channel speech, either speaker can randomly appear as the stronger speaker or the weaker one at a time.…

Sound · Computer Science 2013-01-03 Wajdi Ghezaiel , Amel Ben Slimane , Ezzedine Ben Braiek

Gaussian-Constrained training for speaker verification

Neural models, in particular the d-vector and x-vector architectures, have produced state-of-the-art performance on many speaker verification tasks. However, two potential problems of these neural models deserve more investigation. Firstly,…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-19 Lantian Li , Zhiyuan Tang , Ying Shi , Dong Wang

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

Speaker diarization has gained considerable attention within speech processing research community. Mainstream speaker diarization rely primarily on speakers' voice characteristics extracted from acoustic signals and often overlook the…

Sound · Computer Science 2024-02-06 Luyao Cheng , Siqi Zheng , Qinglin Zhang , Hui Wang , Yafeng Chen , Qian Chen , Shiliang Zhang

Improving Source Separation via Multi-Speaker Representations

Lately there have been novel developments in deep learning towards solving the cocktail party problem. Initial results are very promising and allow for more research in the domain. One technique that has not yet been explored in the neural…

Sound · Computer Science 2017-08-30 Jeroen Zegers , Hugo Van hamme

Single channel voice separation for unknown number of speakers under reverberant and noisy settings

We present a unified network for voice separation of an unknown number of speakers. The proposed approach is composed of several separation heads optimized together with a speaker classification branch. The separation is carried out in the…

Sound · Computer Science 2020-11-05 Shlomo E. Chazan , Lior Wolf , Eliya Nachmani , Yossi Adi

Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings

Speaker embeddings are promising identity-related features that can enhance the identity assignment performance of a tracking system by leveraging its spatial predictions, i.e, by performing identity reassignment. Common speaker embedding…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-21 Taous Iatariene , Alexandre Guérin , Romain Serizel

Deep clustering: Discriminative embeddings for segmentation and separation

We address the problem of acoustic source separation in a deep learning framework we call "deep clustering." Rather than directly estimating signals or masking functions, we train a deep network to produce spectrogram embeddings that are…

Neural and Evolutionary Computing · Computer Science 2015-08-19 John R. Hershey , Zhuo Chen , Jonathan Le Roux , Shinji Watanabe

Robust speaker recognition using unsupervised adversarial invariance

In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations. We adopt a recently proposed unsupervised adversarial…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-05 Raghuveer Peri , Monisankha Pal , Arindam Jati , Krishna Somandepalli , Shrikanth Narayanan

Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning

This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in a conversation without…

Sound · Computer Science 2024-12-02 Amit Kumar Bhuyan , Hrishikesh Dutta , Subir Biswas

Multimodal Clustering with Role Induced Constraints for Speaker Diarization

Speaker clustering is an essential step in conventional speaker diarization systems and is typically addressed as an audio-only speech processing task. The language used by the participants in a conversation, however, carries additional…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-12 Nikolaos Flemotomos , Shrikanth Narayanan

Reformulating Speaker Diarization as Community Detection With Emphasis On Topological Structure

Clustering-based speaker diarization has stood firm as one of the major approaches in reality, despite recent development in end-to-end diarization. However, clustering methods have not been explored extensively for speaker diarization.…

Sound · Computer Science 2022-04-27 Siqi Zheng , Hongbin Suo

Improved Relation Networks for End-to-End Speaker Verification and Identification

Speaker identification systems in a real-world scenario are tasked to identify a speaker amongst a set of enrolled speakers given just a few samples for each enrolled speaker. This paper demonstrates the effectiveness of meta-learning and…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-25 Ashutosh Chaubey , Sparsh Sinha , Susmita Ghose

Graph-based Label Propagation for Semi-Supervised Speaker Identification

Speaker identification in the household scenario (e.g., for smart speakers) is typically based on only a few enrollment utterances but a much larger set of unlabeled data, suggesting semisupervised learning to improve speaker profiles. We…

Sound · Computer Science 2022-02-22 Long Chen , Venkatesh Ravichandran , Andreas Stolcke

Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

Speaker identification typically involves three stages. First, a front-end speaker embedding model is trained to embed utterance and speaker profiles. Second, a scoring function is applied between a runtime utterance and each speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-22 Zhenning Tan , Yuguang Yang , Eunjung Han , Andreas Stolcke

DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments

In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-06 Shikha Baghel , Shreyas Ramoji , Sidharth , Ranjana H , Prachi Singh , Somil Jain , Pratik Roy Chowdhuri , Kaustubh Kulkarni , Swapnil Padhi , Deepu Vijayasenan , Sriram Ganapathy

Speaker Clustering With Neural Networks And Audio Processing

Speaker clustering is the task of differentiating speakers in a recording. In a way, the aim is to answer "who spoke when" in audio recordings. A common method used in industry is feature extraction directly from the recording thanks to…

Sound · Computer Science 2018-03-23 Maxime Jumelle , Taqiyeddine Sakmeche