Related papers: Aligning Speakers: Evaluating and Visualizing Text…

Do We Still Need Audio? Rethinking Speaker Diarization with a Text-Based Approach Using Multiple Prediction Models

We present a novel approach to Speaker Diarization (SD) by leveraging text-based methods focused on Sentence-level Speaker Change Detection within dialogues. Unlike audio-based SD systems, which are often challenged by audio quality and…

Computation and Language · Computer Science 2025-06-16 Peilin Wu , Jinho D. Choi

Joint Speech Recognition and Speaker Diarization via Sequence Transduction

Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate…

Computation and Language · Computer Science 2019-07-12 Laurent El Shafey , Hagen Soltau , Izhak Shafran

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

Speaker diarization(SD) is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in…

Computation and Language · Computer Science 2023-05-23 Luyao Cheng , Siqi Zheng , Zhang Qinglin , Hui Wang , Yafeng Chen , Qian Chen

Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation

This paper proposes a novel Sequence-to-Sequence Neural Diarization (S2SND) framework to perform online and offline speaker diarization. It is developed from the sequence-to-sequence architecture of our previous target-speaker voice…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-24 Ming Cheng , Yuke Lin , Ming Li

Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction

Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words. The conventional approach reconciles outputs from independently optimized ASR and SD systems, where…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-20 Rohit Paturi , Sundararajan Srinivasan , Xiang Li

Multi-scale Speaker Diarization with Dynamic Scale Weighting

Speaker diarization systems are challenged by a trade-off between the temporal resolution and the fidelity of the speaker representation. By obtaining a superior temporal resolution with an enhanced accuracy, a multi-scale approach is a way…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-31 Tae Jin Park , Nithin Rao Koluguri , Jagadeesh Balam , Boris Ginsburg

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

Overlapping speech diarization is always treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set. Specifically, we…

Sound · Computer Science 2021-11-30 Zhihao Du , Shiliang Zhang , Siqi Zheng , Weilong Huang , Ming Lei

Speech Recognition and Multi-Speaker Diarization of Long Conversations

Speech recognition (ASR) and speaker diarization (SD) models have traditionally been trained separately to produce rich conversation transcripts with speaker labels. Recent advances have shown that joint ASR and SD models can learn to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-06 Huanru Henry Mao , Shuyang Li , Julian McAuley , Garrison Cottrell

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and sometimes when it was spoken. Recent…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-27 Li Li , Ming Cheng , Weixin Zhu , Yannan Wang , Juan Liu , Ming Li

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person…

Computation and Language · Computer Science 2022-08-18 Gaofeng Cheng , Yifan Chen , Runyan Yang , Qingxuan Li , Zehui Yang , Lingxuan Ye , Pengyuan Zhang , Qingqing Zhang , Lei Xie , Yanmin Qian , Kong Aik Lee , Yonghong Yan

A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems

Modern neural networks have greatly improved performance across speech recognition benchmarks. However, gains are often driven by frequent words with limited semantic weight, which can obscure meaningful differences in word error rate, the…

Computation and Language · Computer Science 2026-04-21 Lasse Borgholt , Jakob Havtorn , Christian Igel , Lars Maaløe , Zheng-Hua Tan

Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment

Diarization is a crucial component in meeting transcription systems to ease the challenges of speech enhancement and attribute the transcriptions to the correct speaker. Particularly in the presence of overlapping or noisy speech, these…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-06 Christoph Boeddeker , Tobias Cord-Landwehr , Reinhold Haeb-Umbach

Self-supervised learning for audio-visual speaker diarization

Speaker diarization, which is to find the speech segments of specific speakers, has been widely used in human-centered applications such as video conferences or human-computer interaction systems. In this paper, we propose a self-supervised…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-14 Yifan Ding , Yong Xu , Shi-Xiong Zhang , Yahuan Cong , Liqiang Wang

Pretraining Multi-Speaker Identification for Neural Speaker Diarization

End-to-end speaker diarization enables accurate overlap-aware diarization by jointly estimating multiple speakers' speech activities in parallel. This approach is data-hungry, requiring a large amount of labeled conversational data, which…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-02 Shota Horiguchi , Atsushi Ando , Marc Delcroix , Naohiro Tawara

Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers

We propose a new method for speaker diarization that can handle overlapping speech with 2+ people. Our method is based on compositional embeddings [1]: Like standard speaker embedding methods such as x-vector [2], compositional embedding…

Sound · Computer Science 2021-02-11 Zeqian Li , Jacob Whitehill

SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models

Speaker Diarization (SD) is a crucial component of modern end-to-end ASR pipelines. Traditional SD systems, which are typically audio-based and operate independently of ASR, often introduce speaker errors, particularly during speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-16 Anurag Kumar , Rohit Paturi , Amber Afshan , Sundararajan Srinivasan

Multi-channel Conversational Speaker Separation via Neural Diarization

When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems substantially degrades as they are designed for single-talker speech. To enhance ASR performance in conversational or meeting environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-16 Hassan Taherian , DeLiang Wang

A Review of Speaker Diarization: Recent Advances with Deep Learning

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-29 Tae Jin Park , Naoyuki Kanda , Dimitrios Dimitriadis , Kyu J. Han , Shinji Watanabe , Shrikanth Narayanan

DiaCorrect: End-to-end error correction for speaker diarization

In recent years, speaker diarization has attracted widespread attention. To achieve better performance, some studies propose to diarize speech in multiple stages. Although these methods might bring additional benefits, most of them are…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-19 Jiangyu Han , Yuhang Cao , Heng Lu , Yanhua Long

Neural Speaker Diarization with Speaker-Wise Chain Rule

Speaker diarization is an essential step for processing multi-speaker audio. Although an end-to-end neural diarization (EEND) method achieved state-of-the-art performance, it is limited to a fixed number of speakers. In this paper, we solve…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-03 Yusuke Fujita , Shinji Watanabe , Shota Horiguchi , Yawen Xue , Jing Shi , Kenji Nagamatsu