Related papers: Implicit spoken language diarization

Implicit Self-supervised Language Representation for Spoken Language Diarization

In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-16 Jagabandhu Mishra , S. R. Mahadeva Prasanna

Language Modelling for Speaker Diarization in Telephonic Interviews

The aim of this paper is to investigate the benefit of combining both language and acoustic modelling for speaker diarization. Although conventional systems only use acoustic features, in some scenarios linguistic data contain high…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-31 Miquel India , Javier Hernando , José A. R. Fonollosa

A language model based approach towards large scale and lightweight language identification systems

Multilingual spoken dialogue systems have gained prominence in the recent past necessitating the requirement for a front-end Language Identification (LID) system. Most of the existing LID systems rely on modeling the language discriminative…

Sound · Computer Science 2016-01-26 Brij Mohan Lal Srivastava , Hari Krishna Vydana , Anil Kumar Vuppala , Manish Shrivastava

Speaker Embedding Extraction with Phonetic Information

Speaker embeddings achieve promising results on many speaker verification tasks. Phonetic information, as an important component of speech, is rarely considered in the extraction of speaker embeddings. In this paper, we introduce phonetic…

Sound · Computer Science 2018-06-15 Yi Liu , Liang He , Jia Liu , Michael T. Johnson

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-10 Quan Wang , Yiling Huang , Guanlong Zhao , Evan Clark , Wei Xia , Hank Liao

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

Large language models (LLMs) have shown great promise for capturing contextual information in natural language processing tasks. We propose a novel approach to speaker diarization that incorporates the prowess of LLMs to exploit contextual…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-15 Tae Jin Park , Kunal Dhawan , Nithin Koluguri , Jagadeesh Balam

SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation

In this paper, we present a neural spoken language diarization model that supports an unconstrained span of languages within a single framework. Our approach integrates a learnable query-based architecture grounded in multilingual…

Computation and Language · Computer Science 2025-10-02 Sangmin Lee , Woongjib Choi , Jihyun Kim , Hong-Goo Kang

Speaker Diarization with Lexical Information

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-16 Tae Jin Park , Kyu J. Han , Jing Huang , Xiaodong He , Bowen Zhou , Panayiotis Georgiou , Shrikanth Narayanan

Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction

Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words. The conventional approach reconciles outputs from independently optimized ASR and SD systems, where…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-20 Rohit Paturi , Sundararajan Srinivasan , Xiang Li

From Modular to End-to-End Speaker Diarization

Speaker diarization is usually referred to as the task that determines ``who spoke when'' in a recording. Until a few years ago, all competitive approaches were modular. Systems based on this framework reached state-of-the-art performance…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-15 Federico Landini

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

Many mispronunciation detection and diagnosis (MD&D) research approaches try to exploit both the acoustic and linguistic features as input. Yet the improvement of the performance is limited, partially due to the shortage of large amount…

Computation and Language · Computer Science 2022-04-01 Wenxuan Ye , Shaoguang Mao , Frank Soong , Wenshan Wu , Yan Xia , Jonathan Tien , Zhiyong Wu

Phonetic Temporal Neural Model for Language Identification

Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this…

Computation and Language · Computer Science 2017-08-28 Zhiyuan Tang , Dong Wang , Yixiang Chen , Lantian Li , Andrew Abel

End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of…

Sound · Computer Science 2021-05-06 Soumi Maiti , Hakan Erdogan , Kevin Wilson , Scott Wisdom , Shinji Watanabe , John R. Hershey

Transducer-based language embedding for spoken language identification

The acoustic and linguistic features are important cues for the spoken language identification (LID) task. Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding. In this paper,…

Computation and Language · Computer Science 2022-08-01 Peng Shen , Xugang Lu , Hisashi Kawai

Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition

Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events. In the present study, we propose a new learning mechanism based on subspace-based…

Sound · Computer Science 2022-03-30 Hung-Shin Lee , Yu Tsao , Shyh-Kang Jeng , Hsin-Min Wang

AG-LSEC: Audio Grounded Lexical Speaker Error Correction

Speaker Diarization (SD) systems are typically audio-based and operate independently of the ASR system in traditional speech transcription pipelines and can have speaker errors due to SD and/or ASR reconciliation, especially around speaker…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-26 Rohit Paturi , Xiang Li , Sundararajan Srinivasan

Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings

In this paper we propose a new method of speaker diarization that employs a deep learning architecture to learn speaker embeddings. In contrast to the traditional approaches that build their speaker embeddings using manually hand-crafted…

Sound · Computer Science 2017-09-18 Pawel Cyrta , Tomasz Trzciński , Wojciech Stokowiec

Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech

This paper addresses spoken language identification (SLI) and speech recognition of multilingual broadcast and institutional speech, real application scenarios that have been rarely addressed in the SLI literature. Observing that in these…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-14 Martina Valente , Fabio Brugnara , Giovanni Morrone , Enrico Zovato , Leonardo Badino

PLDA-Based Diarization of Telephone Conversations

This paper investigates the application of the probabilistic linear discriminant analysis (PLDA) to speaker diarization of telephone conversations. We introduce using a variational Bayes (VB) approach for inference under a PLDA model for…

Audio and Speech Processing · Electrical Eng. & Systems 2017-10-03 Ahmet E. Bulut , Hakan Demir , Yusuf Ziya Isik , Hakan Erdogan

Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks

While there has been substantial amount of work in speaker diarization recently, there are few efforts in jointly employing lexical and acoustic information for speaker segmentation. Towards that, we investigate a speaker diarization system…

Audio and Speech Processing · Electrical Eng. & Systems 2018-05-29 Tae Jin Park , Panayiotis Georgiou