English
Related papers

Related papers: Aligning Speakers: Evaluating and Visualizing Text…

200 papers

We present a novel approach to Speaker Diarization (SD) by leveraging text-based methods focused on Sentence-level Speaker Change Detection within dialogues. Unlike audio-based SD systems, which are often challenged by audio quality and…

Computation and Language · Computer Science 2025-06-16 Peilin Wu , Jinho D. Choi

Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate…

Computation and Language · Computer Science 2019-07-12 Laurent El Shafey , Hagen Soltau , Izhak Shafran

Speaker diarization(SD) is a classic task in speech processing and is crucial in multi-party scenarios such as meetings and conversations. Current mainstream speaker diarization approaches consider acoustic information only, which result in…

Computation and Language · Computer Science 2023-05-23 Luyao Cheng , Siqi Zheng , Zhang Qinglin , Hui Wang , Yafeng Chen , Qian Chen

This paper proposes a novel Sequence-to-Sequence Neural Diarization (S2SND) framework to perform online and offline speaker diarization. It is developed from the sequence-to-sequence architecture of our previous target-speaker voice…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-24 Ming Cheng , Yuke Lin , Ming Li

Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words. The conventional approach reconciles outputs from independently optimized ASR and SD systems, where…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-20 Rohit Paturi , Sundararajan Srinivasan , Xiang Li

Speaker diarization systems are challenged by a trade-off between the temporal resolution and the fidelity of the speaker representation. By obtaining a superior temporal resolution with an enhanced accuracy, a multi-scale approach is a way…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-31 Tae Jin Park , Nithin Rao Koluguri , Jagadeesh Balam , Boris Ginsburg

Overlapping speech diarization is always treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set. Specifically, we…

Sound · Computer Science 2021-11-30 Zhihao Du , Shiliang Zhang , Siqi Zheng , Weilong Huang , Ming Lei

Speech recognition (ASR) and speaker diarization (SD) models have traditionally been trained separately to produce rich conversation transcripts with speaker labels. Recent advances have shown that joint ASR and SD models can learn to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-06 Huanru Henry Mao , Shuyang Li , Julian McAuley , Garrison Cottrell

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and sometimes when it was spoken. Recent…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-27 Li Li , Ming Cheng , Weixin Zhu , Yannan Wang , Juan Liu , Ming Li

The conversation scenario is one of the most important and most challenging scenarios for speech processing technologies because people in conversation respond to each other in a casual style. Detecting the speech activities of each person…

Computation and Language · Computer Science 2022-08-18 Gaofeng Cheng , Yifan Chen , Runyan Yang , Qingxuan Li , Zehui Yang , Lingxuan Ye , Pengyuan Zhang , Qingqing Zhang , Lei Xie , Yanmin Qian , Kong Aik Lee , Yonghong Yan

Modern neural networks have greatly improved performance across speech recognition benchmarks. However, gains are often driven by frequent words with limited semantic weight, which can obscure meaningful differences in word error rate, the…

Computation and Language · Computer Science 2026-04-21 Lasse Borgholt , Jakob Havtorn , Christian Igel , Lars Maaløe , Zheng-Hua Tan

Diarization is a crucial component in meeting transcription systems to ease the challenges of speech enhancement and attribute the transcriptions to the correct speaker. Particularly in the presence of overlapping or noisy speech, these…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-06 Christoph Boeddeker , Tobias Cord-Landwehr , Reinhold Haeb-Umbach

Speaker diarization, which is to find the speech segments of specific speakers, has been widely used in human-centered applications such as video conferences or human-computer interaction systems. In this paper, we propose a self-supervised…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-14 Yifan Ding , Yong Xu , Shi-Xiong Zhang , Yahuan Cong , Liqiang Wang

End-to-end speaker diarization enables accurate overlap-aware diarization by jointly estimating multiple speakers' speech activities in parallel. This approach is data-hungry, requiring a large amount of labeled conversational data, which…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-02 Shota Horiguchi , Atsushi Ando , Marc Delcroix , Naohiro Tawara

We propose a new method for speaker diarization that can handle overlapping speech with 2+ people. Our method is based on compositional embeddings [1]: Like standard speaker embedding methods such as x-vector [2], compositional embedding…

Sound · Computer Science 2021-02-11 Zeqian Li , Jacob Whitehill

Speaker Diarization (SD) is a crucial component of modern end-to-end ASR pipelines. Traditional SD systems, which are typically audio-based and operate independently of ASR, often introduce speaker errors, particularly during speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-16 Anurag Kumar , Rohit Paturi , Amber Afshan , Sundararajan Srinivasan

When dealing with overlapped speech, the performance of automatic speech recognition (ASR) systems substantially degrades as they are designed for single-talker speech. To enhance ASR performance in conversational or meeting environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-11-16 Hassan Taherian , DeLiang Wang

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-29 Tae Jin Park , Naoyuki Kanda , Dimitrios Dimitriadis , Kyu J. Han , Shinji Watanabe , Shrikanth Narayanan

In recent years, speaker diarization has attracted widespread attention. To achieve better performance, some studies propose to diarize speech in multiple stages. Although these methods might bring additional benefits, most of them are…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-19 Jiangyu Han , Yuhang Cao , Heng Lu , Yanhua Long

Speaker diarization is an essential step for processing multi-speaker audio. Although an end-to-end neural diarization (EEND) method achieved state-of-the-art performance, it is limited to a fixed number of speakers. In this paper, we solve…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-03 Yusuke Fujita , Shinji Watanabe , Shota Horiguchi , Yawen Xue , Jing Shi , Kenji Nagamatsu
‹ Prev 1 2 3 10 Next ›