Related papers: DOVER: A Method for Combining Diarization Outputs

Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Speaker diarization based on bottom-up clustering of speech segments by acoustic similarity is often highly sensitive to the choice of hyperparameters, such as the initial number of clusters and feature weighting. Optimizing these…

Computation and Language · Computer Science 2022-02-22 Andreas Stolcke

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

Several advances have been made recently towards handling overlapping speech for speaker diarization. Since speech and natural language tasks often benefit from ensemble techniques, we propose an algorithm for combining outputs from such…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Desh Raj , Leibny Paola Garcia-Perera , Zili Huang , Shinji Watanabe , Daniel Povey , Andreas Stolcke , Sanjeev Khudanpur

MOVER: Combining Multiple Meeting Recognition Systems

In this paper, we propose Meeting recognizer Output Voting Error Reduction (MOVER), a novel system combination method for meeting recognition tasks. Although there are methods to combine the output of diarization (e.g., DOVER) or automatic…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-08 Naoyuki Kamo , Tsubasa Ochiai , Marc Delcroix , Tomohiro Nakatani

DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

We introduce DIVE, an end-to-end speaker diarization algorithm. Our neural algorithm presents the diarization task as an iterative process: it repeatedly builds a representation for each speaker before predicting the voice activity of each…

Sound · Computer Science 2021-05-31 Neil Zeghidour , Olivier Teboul , David Grangier

Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020

This paper describes the Microsoft speaker diarization system for monaural multi-talker recordings in the wild, evaluated at the diarization track of the VoxCeleb Speaker Recognition Challenge(VoxSRC) 2020. We will first explain our system…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-26 Xiong Xiao , Naoyuki Kanda , Zhuo Chen , Tianyan Zhou , Takuya Yoshioka , Sanyuan Chen , Yong Zhao , Gang Liu , Yu Wu , Jian Wu , Shujie Liu , Jinyu Li , Yifan Gong

A Review of Speaker Diarization: Recent Advances with Deep Learning

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-29 Tae Jin Park , Naoyuki Kanda , Dimitrios Dimitriadis , Kyu J. Han , Shinji Watanabe , Shrikanth Narayanan

Reformulating DOVER-Lap Label Mapping as a Graph Partitioning Problem

We recently proposed DOVER-Lap, a method for combining overlap-aware speaker diarization system outputs. DOVER-Lap improved upon its predecessor DOVER by using a label mapping method based on globally-informed greedy search. In this paper,…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-07 Desh Raj , Sanjeev Khudanpur

Speaker Diarization Based on Multi-channel Microphone Array in Small-scale Meeting

In the task of speaker diarization, the number of small-scale meetings accounts for a large proportion. When microphone arrays are employed as a recording device, its spatial information is usually ignored by most researchers. In this…

Sound · Computer Science 2022-10-27 Yuxuan Du , Ruohua Zhou

Multi-scale Speaker Diarization with Dynamic Scale Weighting

Speaker diarization systems are challenged by a trade-off between the temporal resolution and the fidelity of the speaker representation. By obtaining a superior temporal resolution with an enhanced accuracy, a multi-scale approach is a way…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-31 Tae Jin Park , Nithin Rao Koluguri , Jagadeesh Balam , Boris Ginsburg

Investigating Confidence Estimation Measures for Speaker Diarization

Speaker diarization systems segment a conversation recording based on the speakers' identity. Such systems can misclassify the speaker of a portion of audio due to a variety of factors, such as speech pattern variation, background noise,…

Sound · Computer Science 2024-06-26 Anurag Chowdhury , Abhinav Misra , Mark C. Fuhs , Monika Woszczyna

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing…

Machine Learning · Computer Science 2024-08-23 Luyao Cheng , Hui Wang , Siqi Zheng , Yafeng Chen , Rongjie Huang , Qinglin Zhang , Qian Chen , Xihao Li

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimate the direction of arrival (DOA) of all…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Aswin Shanmugam Subramanian , Chao Weng , Shinji Watanabe , Meng Yu , Dong Yu

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization

We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a Continuous Speech Separation (CSS) system with a TF-GridNet…

Audio and Speech Processing · Electrical Eng. & Systems 2024-05-07 Thilo von Neumann , Christoph Boeddeker , Tobias Cord-Landwehr , Marc Delcroix , Reinhold Haeb-Umbach

Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models

This paper investigates the use of target-speaker automatic speech recognition (TS-ASR) for simultaneous speech recognition and speaker diarization of single-channel dialogue recordings. TS-ASR is a technique to automatically extract and…

Computation and Language · Computer Science 2019-09-19 Naoyuki Kanda , Shota Horiguchi , Yusuke Fujita , Yawen Xue , Kenji Nagamatsu , Shinji Watanabe

Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations

Although automatic emotion recognition (AER) has recently drawn significant research interest, most current AER studies use manually segmented utterances, which are usually unavailable for dialogue systems. This paper proposes integrating…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-15 Wen Wu , Chao Zhang , Philip C. Woodland

Distributed Speech Dereverberation Using Weighted Prediction Error

Speech dereverberation aims to alleviate the negative impact of late reverberant reflections. The weighted prediction error (WPE) method is a well-established technique known for its superior performance in dereverberation. However, in…

Audio and Speech Processing · Electrical Eng. & Systems 2023-12-07 Ziye Yang , Mengfei Zhang , Jie Chen

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation. With technical advances in systems dealing with speech separation, speaker diarization, and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Desh Raj , Pavel Denisov , Zhuo Chen , Hakan Erdogan , Zili Huang , Maokui He , Shinji Watanabe , Jun Du , Takuya Yoshioka , Yi Luo , Naoyuki Kanda , Jinyu Li , Scott Wisdom , John R. Hershey

Automatic Quality Estimation for ASR System Combination

Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some…

Computation and Language · Computer Science 2017-06-23 Shahab Jalalvand , Matteo Negri , Daniele Falavigna , Marco Matassoni , Marco Turchi

NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization

This paper details our speaker diarization system designed for multi-domain, multi-microphone casual conversations. The proposed diarization pipeline uses weighted prediction error (WPE)-based dereverberation as a front end, then applies…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-25 Naohiro Tawara , Marc Delcroix , Atsushi Ando , Atsunori Ogawa

The BUCEA Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2022

This paper describes the BUCEA speaker diarization system for the 2022 VoxCeleb Speaker Recognition Challenge. Voxsrc-22 provides the development set and test set of VoxConverse, and we mainly use the test set of VoxConverse for parameter…

Sound · Computer Science 2022-09-21 Ruohua Zhou , Yuxuan Du , Chenlei Hu