Related papers: Robust ASR Error Correction with Conservative Data…

ASR Error Correction using Large Language Models

Error correction (EC) models play a crucial role in refining Automatic Speech Recognition (ASR) transcriptions, enhancing the readability and quality of transcriptions. Without requiring access to the underlying code or model weights, EC…

Computation and Language · Computer Science 2025-01-22 Rao Ma , Mengjie Qian , Mark Gales , Kate Knill

Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation

We previously proposed contextual spelling correction (CSC) to correct the output of end-to-end (E2E) automatic speech recognition (ASR) models with contextual information such as name, place, etc. Although CSC has achieved reasonable…

Sound · Computer Science 2023-02-23 Xiaoqiang Wang , Yanqing Liu , Jinyu Li , Sheng Zhao

Generative error correction for code-switching speech recognition using large language models

Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence. Despite the recent advances in automatic speech recognition (ASR), CS-ASR is still a challenging task ought to the grammatical…

Computation and Language · Computer Science 2023-10-23 Chen Chen , Yuchen Hu , Chao-Han Huck Yang , Hexin Liu , Sabato Marco Siniscalchi , Eng Siong Chng

A spelling correction model for end-to-end speech recognition

Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-20 Jinxi Guo , Tara N. Sainath , Ron J. Weiss

Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language. Designing a CS-ASR system has many challenges, mainly due to data scarcity, grammatical…

Computation and Language · Computer Science 2023-01-12 Amir Hussein , Shammur Absar Chowdhury , Ahmed Abdelali , Najim Dehak , Ahmed Ali , Sanjeev Khudanpur

ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction

Automatic speech Recognition (ASR) is a fundamental and important task in the field of speech and natural language processing. It is an inherent building block in many applications such as voice assistant, speech translation, etc. Despite…

Computation and Language · Computer Science 2024-12-05 Victor Junqiu Wei , Weicheng Wang , Di Jiang , Yuanfeng Song , Lu Wang

Which Data Matter? Embedding-Based Data Selection for Speech Recognition

Modern ASR systems are typically trained on large-scale pseudo-labeled, in-the-wild data spanning multiple domains. While such heterogeneous data benefit generalist models designed for broad deployment, they pose challenges for specialist…

Sound · Computer Science 2026-03-16 Zakaria Aldeneh , Skyler Seto , Maureen de Seyssel , Jie Chi , Zijin Gu , Takuya Higuchi , Jee-weon Jung , Shinji Watanabe , David Grangier , Barry-John Theobald , Tatiana Likhomanenko

Visual Information Matters for ASR Error Correction

Aiming to improve the Automatic Speech Recognition (ASR) outputs with a post-processing step, ASR error correction (EC) techniques have been widely developed due to their efficiency in using parallel text data. Previous works mainly focus…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-29 Vanya Bannihatti Kumar , Shanbo Cheng , Ningxin Peng , Yuchen Zhang

Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition

Accurately finding the wrong words in the automatic speech recognition (ASR) hypothesis and recovering them well-founded is the goal of speech error correction. In this paper, we propose a non-autoregressive speech error correction method.…

Computation and Language · Computer Science 2024-07-19 Yuchun Shu , Bo Hu , Yifeng He , Hao Shi , Longbiao Wang , Jianwu Dang

Continuous Learning for Children's ASR: Overcoming Catastrophic Forgetting with Elastic Weight Consolidation and Synaptic Intelligence

In this work, we present the first study addressing automatic speech recognition (ASR) for children in an online learning setting. This is particularly important for both child-centric applications and the privacy protection of minors,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-07 Edem Ahadzi , Vishwanath Pratap Singh , Tomi Kinnunen , Ville Hautamaki

CTC-Assisted LLM-Based Contextual ASR

Contextual ASR or hotword customization holds substantial practical value. Despite the impressive performance of current end-to-end (E2E) automatic speech recognition (ASR) systems, they often face challenges in accurately recognizing rare…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-12 Guanrou Yang , Ziyang Ma , Zhifu Gao , Shiliang Zhang , Xie Chen

Cycle-consistency training for end-to-end speech recognition

This paper presents a method to train end-to-end automatic speech recognition (ASR) models using unpaired data. Although the end-to-end approach can eliminate the need for expert knowledge such as pronunciation dictionaries to build ASR…

Computation and Language · Computer Science 2019-05-24 Takaaki Hori , Ramon Astudillo , Tomoki Hayashi , Yu Zhang , Shinji Watanabe , Jonathan Le Roux

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on…

Computation and Language · Computer Science 2024-01-12 Jiaxin Guo , Minghan Wang , Xiaosong Qiao , Daimeng Wei , Hengchao Shang , Zongyao Li , Zhengzhe Yu , Yinglu Li , Chang Su , Min Zhang , Shimin Tao , Hao Yang

Aligning Speech to Languages to Enhance Code-switching Speech Recognition

Code-switching (CS) refers to the switching of languages within a speech signal and results in language confusion for automatic speech recognition (ASR). To address language confusion, we propose a language alignment loss (LAL) that aligns…

Audio and Speech Processing · Electrical Eng. & Systems 2025-11-04 Hexin Liu , Xiangyu Zhang , Haoyang Zhang , Leibny Paola Garcia , Andy W. H. Khong , Eng Siong Chng , Shinji Watanabe

Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

This paper proposes an adaptation method for end-to-end speech recognition. In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC)…

Computation and Language · Computer Science 2021-04-01 Cong-Thanh Do , Rama Doddipatla , Thomas Hain

Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications

Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle…

Machine Learning · Computer Science 2025-02-28 Marcus Yu Zhe Wee , Justin Juin Hng Wong , Lynus Lim , Joe Yu Wei Tan , Prannaya Gupta , Dillion Lim , En Hao Tew , Aloysius Keng Siew Han , Yong Zhi Lim

Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data. However, human annotators usually perform "non-verbatim" transcription, which can result in poorly trained models. In this paper, we…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-28 Dongji Gao , Hainan Xu , Desh Raj , Leibny Paola Garcia Perera , Daniel Povey , Sanjeev Khudanpur

A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data

We consider the problem of recognizing speech utterances spoken to a device which is generating a known sound waveform; for example, recognizing queries issued to a digital assistant which is generating responses to previous user inputs.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-03 Nathan Howard , Alex Park , Turaj Zakizadeh Shabestary , Alexander Gruenstein , Rohit Prabhavalkar

Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

We present an approach to reduce the performance disparity between geographic regions without degrading performance on the overall user population for ASR. A popular approach is to fine-tune the model with data from regions where the ASR…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-09 Viet Anh Trinh , Pegah Ghahremani , Brian King , Jasha Droppo , Andreas Stolcke , Roland Maas

Contextual Speech Recognition with Difficult Negative Training Examples

Improving the representation of contextual information is key to unlocking the potential of end-to-end (E2E) automatic speech recognition (ASR). In this work, we present a novel and simple approach for training an ASR context mechanism with…

Audio and Speech Processing · Electrical Eng. & Systems 2018-10-30 Uri Alon , Golan Pundak , Tara N. Sainath