English
Related papers

Related papers: Self-supervised learning for audio-visual speaker …

200 papers

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-29 Tae Jin Park , Naoyuki Kanda , Dimitrios Dimitriadis , Kyu J. Han , Shinji Watanabe , Shrikanth Narayanan

Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants…

Computer Vision and Pattern Recognition · Computer Science 2018-10-15 Israel D. Gebru , Silèye Ba , Xiaofei Li , Radu Horaud

Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing…

Machine Learning · Computer Science 2024-08-23 Luyao Cheng , Hui Wang , Siqi Zheng , Yafeng Chen , Rongjie Huang , Qinglin Zhang , Qian Chen , Xihao Li

Speech applications dealing with conversations require not only recognizing the spoken words, but also determining who spoke when. The task of assigning words to speakers is typically addressed by merging the outputs of two separate…

Computation and Language · Computer Science 2019-07-12 Laurent El Shafey , Hagen Soltau , Izhak Shafran

Diarization is a crucial component in meeting transcription systems to ease the challenges of speech enhancement and attribute the transcriptions to the correct speaker. Particularly in the presence of overlapping or noisy speech, these…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-06 Christoph Boeddeker , Tobias Cord-Landwehr , Reinhold Haeb-Umbach

This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in a conversation without…

Sound · Computer Science 2024-12-02 Amit Kumar Bhuyan , Hrishikesh Dutta , Subir Biswas

Diarization partitions an audio stream into segments based on the voices of the speakers. Real-time diarization systems that include an enrollment step should limit enrollment training samples to reduce user interaction time. Although…

Sound · Computer Science 2022-08-09 Dirk Padfield , Daniel J. Liebling

Traditional speech separation and speaker diarization approaches rely on prior knowledge of target speakers or a predetermined number of participants in audio signals. To address these limitations, recent advances focus on developing…

End-to-end speaker diarization enables accurate overlap-aware diarization by jointly estimating multiple speakers' speech activities in parallel. This approach is data-hungry, requiring a large amount of labeled conversational data, which…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-02 Shota Horiguchi , Atsushi Ando , Marc Delcroix , Naohiro Tawara

Self-supervised speech models such as wav2vec2.0 and WavLM have been shown to significantly improve the performance of many downstream speech tasks, especially in low-resource settings, over the past few years. Despite this, evaluations on…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-18 Séverin Baroudi , Hervé Bredin , Joseph Razik , Ricard Marxer

This paper presents a novel evaluation approach to text-based speaker diarization (SD), tackling the limitations of traditional metrics that do not account for any contextual information in text. Two new metrics are proposed, Text-based…

Computation and Language · Computer Science 2023-09-15 Chen Gong , Peilin Wu , Jinho D. Choi

Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-11 Vishal Sharma , Zekun Zhang , Zachary Neubert , Curtis Dyreson

Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker…

Sound · Computer Science 2022-10-07 Yehoshua Dissen , Felix Kreuk , Joseph Keshet

In automatic speech processing systems, speaker diarization is a crucial front-end component to separate segments from different speakers. Inspired by the recent success of deep neural networks (DNNs) in semantic inferencing, triplet…

Audio and Speech Processing · Electrical Eng. & Systems 2018-08-07 Huan Song , Megan Willi , Jayaraman J. Thiagarajan , Visar Berisha , Andreas Spanias

The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language. In particular, the verbatim script (i.e. as-broadcast script) must…

Computation and Language · Computer Science 2023-08-07 Yogesh Virkar , Brian Thompson , Rohit Paturi , Sundararajan Srinivasan , Marcello Federico

In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in…

Sound · Computer Science 2021-07-21 Siqi Zheng , Weilong Huang , Xianliang Wang , Hongbin Suo , Jinwei Feng , Zhijie Yan

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and sometimes when it was spoken. Recent…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-27 Li Li , Ming Cheng , Weixin Zhu , Yannan Wang , Juan Liu , Ming Li

Speaker diarization is a task to label an audio or video recording with the identity of the speaker at each given time stamp. In this work, we propose a novel machine learning framework to conduct real-time multi-speaker diarization and…

Sound · Computer Science 2023-02-23 Baihan Lin , Xinxin Zhang

This work presents self-supervised learning methods for developing monaural speaker-specific (i.e., personalized) speech enhancement models. While generalist models must broadly address many speakers, specialist models can adapt their…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-28 Aswin Sivaraman , Minje Kim

Our objective is to transform a video into a set of discrete audio-visual objects using self-supervised learning. To this end, we introduce a model that uses attention to localize and group sound sources, and optical flow to aggregate…

Computer Vision and Pattern Recognition · Computer Science 2020-08-11 Triantafyllos Afouras , Andrew Owens , Joon Son Chung , Andrew Zisserman
‹ Prev 1 2 3 10 Next ›