English
Related papers

Related papers: Implicit spoken language diarization

200 papers

In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-16 Jagabandhu Mishra , S. R. Mahadeva Prasanna

The aim of this paper is to investigate the benefit of combining both language and acoustic modelling for speaker diarization. Although conventional systems only use acoustic features, in some scenarios linguistic data contain high…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-31 Miquel India , Javier Hernando , José A. R. Fonollosa

Multilingual spoken dialogue systems have gained prominence in the recent past necessitating the requirement for a front-end Language Identification (LID) system. Most of the existing LID systems rely on modeling the language discriminative…

Speaker embeddings achieve promising results on many speaker verification tasks. Phonetic information, as an important component of speech, is rarely considered in the extraction of speaker embeddings. In this paper, we introduce phonetic…

Sound · Computer Science 2018-06-15 Yi Liu , Liang He , Jia Liu , Michael T. Johnson

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-10 Quan Wang , Yiling Huang , Guanlong Zhao , Evan Clark , Wei Xia , Hank Liao

Large language models (LLMs) have shown great promise for capturing contextual information in natural language processing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-15 Tae Jin Park , Kunal Dhawan , Nithin Koluguri , Jagadeesh Balam

In this paper, we present a neural spoken language diarization model that supports an unconstrained span of languages within a single framework. Our approach integrates a learnable query-based architecture grounded in multilingual…

Computation and Language · Computer Science 2025-10-02 Sangmin Lee , Woongjib Choi , Jihyun Kim , Hong-Goo Kang

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-16 Tae Jin Park , Kyu J. Han , Jing Huang , Xiaodong He , Bowen Zhou , Panayiotis Georgiou , Shrikanth Narayanan

Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words. The conventional approach reconciles outputs from independently optimized ASR and SD systems, where…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-20 Rohit Paturi , Sundararajan Srinivasan , Xiang Li

Speaker diarization is usually referred to as the task that determines ``who spoke when'' in a recording. Until a few years ago, all competitive approaches were modular. Systems based on this framework reached state-of-the-art performance…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-15 Federico Landini

Many mispronunciation detection and diagnosis (MD&D) research approaches try to exploit both the acoustic and linguistic features as input. Yet the improvement of the performance is limited, partially due to the shortage of large amount…

Computation and Language · Computer Science 2022-04-01 Wenxuan Ye , Shaoguang Mao , Frank Soong , Wenshan Wu , Yan Xia , Jonathan Tien , Zhiyong Wu

Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this…

Computation and Language · Computer Science 2017-08-28 Zhiyuan Tang , Dong Wang , Yixiang Chen , Lantian Li , Andrew Abel

We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of…

Sound · Computer Science 2021-05-06 Soumi Maiti , Hakan Erdogan , Kevin Wilson , Scott Wisdom , Shinji Watanabe , John R. Hershey

The acoustic and linguistic features are important cues for the spoken language identification (LID) task. Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding. In this paper,…

Computation and Language · Computer Science 2022-08-01 Peng Shen , Xugang Lu , Hisashi Kawai

Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events. In the present study, we propose a new learning mechanism based on subspace-based…

Sound · Computer Science 2022-03-30 Hung-Shin Lee , Yu Tsao , Shyh-Kang Jeng , Hsin-Min Wang

Speaker Diarization (SD) systems are typically audio-based and operate independently of the ASR system in traditional speech transcription pipelines and can have speaker errors due to SD and/or ASR reconciliation, especially around speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-26 Rohit Paturi , Xiang Li , Sundararajan Srinivasan

In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted…

Sound · Computer Science 2017-09-18 Pawel Cyrta , Tomasz Trzciński , Wojciech Stokowiec

This paper addresses spoken language identification (SLI) and speech recognition of multilingual broadcast and institutional speech, real application scenarios that have been rarely addressed in the SLI literature. Observing that in these…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-14 Martina Valente , Fabio Brugnara , Giovanni Morrone , Enrico Zovato , Leonardo Badino

This paper investigates the application of the probabilistic linear discriminant analysis (PLDA) to speaker diarization of telephone conversations. We introduce using a variational Bayes (VB) approach for inference under a PLDA model for…

Audio and Speech Processing · Electrical Eng. & Systems 2017-10-03 Ahmet E. Bulut , Hakan Demir , Yusuf Ziya Isik , Hakan Erdogan

While there has been substantial amount of work in speaker diarization recently, there are few efforts in jointly employing lexical and acoustic information for speaker segmentation. Towards that, we investigate a speaker diarization system…

Audio and Speech Processing · Electrical Eng. & Systems 2018-05-29 Tae Jin Park , Panayiotis Georgiou
‹ Prev 1 2 3 10 Next ›