Related papers: Fully Supervised Speaker Diarization

Supervised online diarization with sample mean loss for multi-domain data

Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network. In this paper we propose qualitative modifications to the model…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-14 Enrico Fini , Alessio Brutti

Speaker Diarization with Region Proposal Network

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem. Although the standard diarization systems can achieve satisfactory results in various scenarios, they…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-18 Zili Huang , Shinji Watanabe , Yusuke Fujita , Paola Garcia , Yiwen Shao , Daniel Povey , Sanjeev Khudanpur

Speaker Diarization: Using Recurrent Neural Networks

Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-11 Vishal Sharma , Zekun Zhang , Zachary Neubert , Curtis Dyreson

Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation

This paper proposes a novel Sequence-to-Sequence Neural Diarization (S2SND) framework to perform online and offline speaker diarization. It is developed from the sequence-to-sequence architecture of our previous target-speaker voice…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-24 Ming Cheng , Yuke Lin , Ming Li

Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted…

Sound · Computer Science 2017-09-18 Pawel Cyrta , Tomasz Trzciński , Wojciech Stokowiec

Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning

This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in a conversation without…

Sound · Computer Science 2024-12-02 Amit Kumar Bhuyan , Hrishikesh Dutta , Subir Biswas

Online Speaker Diarization with Relation Network

In this paper, we propose an online speaker diarization system based on Relation Network, named RenoSD. Unlike conventional diariztion systems which consist of several independently-optimized modules, RenoSD implements…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-22 Xiang Li , Yucheng Zhao , Chong Luo , Wenjun Zeng

An iterative framework for self-supervised deep speaker representation learning

In this paper, we propose an iterative framework for self-supervised speaker representation learning based on a deep neural network (DNN). The framework starts with training a self-supervision speaker embedding network by maximizing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Danwei Cai , Weiqing Wang , Ming Li

Speaker diarization with session-level speaker embedding refinement using graph neural networks

Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-26 Jixuan Wang , Xiong Xiao , Jian Wu , Ranjani Ramamurthy , Frank Rudzicz , Michael Brudno

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors

Attractor-based end-to-end diarization is achieving comparable accuracy to the carefully tuned conventional clustering-based methods on challenging datasets. However, the main drawback is that it cannot deal with the case where the number…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-24 Shota Horiguchi , Shinji Watanabe , Paola Garcia , Yawen Xue , Yuki Takashima , Yohei Kawaguchi

Robust End-to-end Speaker Diarization with Generic Neural Clustering

End-to-end speaker diarization approaches have shown exceptional performance over the traditional modular approaches. To further improve the performance of the end-to-end speaker diarization for real speech recordings, recently works have…

Sound · Computer Science 2022-04-19 Chenyu Yang , Yu Wang

Self-supervised Speaker Diarization

Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker…

Sound · Computer Science 2022-10-07 Yehoshua Dissen , Felix Kreuk , Joseph Keshet

Multi-channel Conversational Speaker Separation via Neural Diarization

When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems substantially degrades as they are designed for single-talker speech. To enhance ASR performance in conversational or meeting environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-16 Hassan Taherian , DeLiang Wang

Robust speaker recognition using unsupervised adversarial invariance

In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations. We adopt a recently proposed unsupervised adversarial…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-05 Raghuveer Peri , Monisankha Pal , Arindam Jati , Krishna Somandepalli , Shrikanth Narayanan

DNN Speaker Tracking with Embeddings

In multi-speaker applications is common to have pre-computed models from enrolled speakers. Using these models to identify the instances in which these speakers intervene in a recording is the task of speaker tracking. In this paper, we…

Sound · Computer Science 2020-07-21 Carlos Rodrigo Castillo-Sanchez , Leibny Paola Garcia-Perera , Anabel Martin-Gonzalez

Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization

Conventional methods for speaker diarization involve windowing an audio file into short segments to extract speaker embeddings, followed by an unsupervised clustering of the embeddings. This multi-step approach generates speaker assignments…

Sound · Computer Science 2023-02-27 Prachi Singh , Amrit Kaul , Sriram Ganapathy

Triplet Network with Attention for Speaker Diarization

In automatic speech processing systems, speaker diarization is a crucial front-end component to separate segments from different speakers. Inspired by the recent success of deep neural networks (DNNs) in semantic inferencing, triplet…

Audio and Speech Processing · Electrical Eng. & Systems 2018-08-07 Huan Song , Megan Willi , Jayaraman J. Thiagarajan , Visar Berisha , Andreas Spanias

Speaker Diarization with LSTM

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based audio…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-25 Quan Wang , Carlton Downey , Li Wan , Philip Andrew Mansfield , Ignacio Lopez Moreno

Online End-to-End Neural Diarization with Speaker-Tracing Buffer

This paper proposes a novel online speaker diarization algorithm based on a fully supervised self-attention mechanism (SA-EEND). Online diarization inherently presents a speaker's permutation problem due to the possibility to assign speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-09 Yawen Xue , Shota Horiguchi , Yusuke Fujita , Shinji Watanabe , Kenji Nagamatsu

Systematic Evaluation of Online Speaker Diarization Systems Regarding their Latency

In this paper, different online speaker diarization systems are evaluated on the same hardware with the same test data with regard to their latency. The latency is the time span from audio input to the output of the corresponding speaker…

Computation and Language · Computer Science 2024-07-08 Roman Aperdannier , Sigurd Schacht , Alexander Piazza