Related papers: Localization Based Sequential Grouping for Continu…

A Real-time Speaker Diarization System Based on Spatial Spectrum

In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in…

Sound · Computer Science 2021-07-21 Siqi Zheng , Weilong Huang , Xianliang Wang , Hongbin Suo , Jinwei Feng , Zhijie Yan

Linguistically Aided Speaker Diarization Using Speaker Role Information

Speaker diarization relies on the assumption that speech segments corresponding to a particular speaker are concentrated in a specific region of the speaker space; a region which represents that speaker's identity. These identities are not…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Nikolaos Flemotomos , Panayiotis Georgiou , Shrikanth Narayanan

Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks

While there has been substantial amount of work in speaker diarization recently, there are few efforts in jointly employing lexical and acoustic information for speaker segmentation. Towards that, we investigate a speaker diarization system…

Audio and Speech Processing · Electrical Eng. & Systems 2018-05-29 Tae Jin Park , Panayiotis Georgiou

Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features

Speaker diarization is the task of answering Who spoke and when? in an audio stream. Pipeline systems rely on speech segmentation to extract speakers' segments and achieve robust speaker diarization. This paper proposes a common framework…

Sound · Computer Science 2023-06-08 Théo Mariotte , Anthony Larcher , Silvio Montrésor , Jean-Hugh Thomas

Speaker Diarization: Using Recurrent Neural Networks

Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-11 Vishal Sharma , Zekun Zhang , Zachary Neubert , Curtis Dyreson

Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling

Traditional speech separation and speaker diarization approaches rely on prior knowledge of target speakers or a predetermined number of participants in audio signals. To address these limitations, recent advances focus on developing…

Sound · Computer Science 2025-08-11 Md Asif Jalal , Luca Remaggi , Vasileios Moschopoulos , Thanasis Kotsiopoulos , Vandana Rajan , Karthikeyan Saravanan , Anastasis Drosou , Junho Heo , Hyuk Oh , Seokyeong Jeong

The Cone of Silence: Speech Separation by Localization

Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain,…

Sound · Computer Science 2020-10-14 Teerapat Jenrungrot , Vivek Jayaram , Steve Seitz , Ira Kemelmacher-Shlizerman

A Review of Speaker Diarization: Recent Advances with Deep Learning

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-29 Tae Jin Park , Naoyuki Kanda , Dimitrios Dimitriadis , Kyu J. Han , Shinji Watanabe , Shrikanth Narayanan

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

Speaker diarization(SD) is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in…

Computation and Language · Computer Science 2023-05-23 Luyao Cheng , Siqi Zheng , Zhang Qinglin , Hui Wang , Yafeng Chen , Qian Chen

Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation

This paper proposes a novel Sequence-to-Sequence Neural Diarization (S2SND) framework to perform online and offline speaker diarization. It is developed from the sequence-to-sequence architecture of our previous target-speaker voice…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-24 Ming Cheng , Yuke Lin , Ming Li

End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of…

Sound · Computer Science 2021-05-06 Soumi Maiti , Hakan Erdogan , Kevin Wilson , Scott Wisdom , Shinji Watanabe , John R. Hershey

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimate the direction of arrival (DOA) of all…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Aswin Shanmugam Subramanian , Chao Weng , Shinji Watanabe , Meng Yu , Dong Yu

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing…

Machine Learning · Computer Science 2024-08-23 Luyao Cheng , Hui Wang , Siqi Zheng , Yafeng Chen , Rongjie Huang , Qinglin Zhang , Qian Chen , Xihao Li

Online speaker diarization of meetings guided by speech separation

Overlapped speech is notoriously problematic for speaker diarization systems. Consequently, the use of speech separation has recently been proposed to improve their performance. Although promising, speech separation models struggle with…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-02 Elio Gruttadauria , Mathieu Fontaine , Slim Essid

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization

We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a Continuous Speech Separation (CSS) system with a TF-GridNet…

Audio and Speech Processing · Electrical Eng. & Systems 2024-05-07 Thilo von Neumann , Christoph Boeddeker , Tobias Cord-Landwehr , Marc Delcroix , Reinhold Haeb-Umbach

Self-supervised Representation Learning With Path Integral Clustering For Speaker Diarization

Automatic speaker diarization techniques typically involve a two-stage processing approach where audio segments of fixed duration are converted to vector representations in the first stage. This is followed by an unsupervised clustering of…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-15 Prachi Singh , Sriram Ganapathy

Speaker Diarization with Lexical Information

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-16 Tae Jin Park , Kyu J. Han , Jing Huang , Xiaodong He , Bowen Zhou , Panayiotis Georgiou , Shrikanth Narayanan

Multi-channel Conversational Speaker Separation via Neural Diarization

When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems substantially degrades as they are designed for single-talker speech. To enhance ASR performance in conversational or meeting environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-16 Hassan Taherian , DeLiang Wang

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

The prevailing noise-resistant and reverberation-resistant localization algorithms primarily emphasize separating and providing directional output for each speaker in multi-speaker scenarios, without association with the identity of…

Sound · Computer Science 2023-10-18 Yu Chen , Xinyuan Qian , Zexu Pan , Kainan Chen , Haizhou Li

Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation

We address talker-independent monaural speaker separation from the perspectives of deep learning and computational auditory scene analysis (CASA). Specifically, we decompose the multi-speaker separation task into the stages of simultaneous…

Sound · Computer Science 2019-04-26 Yuzhou Liu , DeLiang Wang