Related papers: Error Correction in ASR using Sequence-to-Sequence…

Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and…

Computation and Language · Computer Science 2019-10-24 Oleksii Hrinchuk , Mariya Popova , Boris Ginsburg

Tag and correct: high precision post-editing approach to correction of speech recognition errors

This paper presents a new approach to the problem of correcting speech recognition errors by means of post-editing. It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis…

Computation and Language · Computer Science 2024-06-13 Tomasz Ziętkiewicz

Hybrid phonetic-neural model for correction in speech recognition systems

Automatic speech recognition (ASR) is a relevant area in multiple settings because it provides a natural communication mechanism between applications and users. ASRs often fail in environments that use language specific to particular…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Rafael Viana-Cámara , Mario Campos-Soberanis , Diego Campos-Sobrino

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER) than original ASR outputs. Previous works usually use a sequence-to-sequence…

Computation and Language · Computer Science 2022-11-30 Yichong Leng , Xu Tan , Linchen Zhu , Jin Xu , Renqian Luo , Linquan Liu , Tao Qin , Xiang-Yang Li , Ed Lin , Tie-Yan Liu

BART based semantic correction for Mandarin automatic speech recognition system

Although automatic speech recognition (ASR) systems achieved significantly improvements in recent years, spoken language recognition error occurs which can be easily spotted by human beings. Various language modeling techniques have been…

Computation and Language · Computer Science 2021-12-21 Yun Zhao , Xuerui Yang , Jinchao Wang , Yongyu Gao , Chao Yan , Yuanfu Zhou

Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models

Sequence-to-sequence models, such as attention-based models in automatic speech recognition (ASR), are typically trained to optimize the cross-entropy criterion which corresponds to improving the log-likelihood of the data. However, system…

Computation and Language · Computer Science 2017-12-06 Rohit Prabhavalkar , Tara N. Sainath , Yonghui Wu , Patrick Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Kannan

Audio-attention discriminative language model for ASR rescoring

End-to-end approaches for automatic speech recognition (ASR) benefit from directly modeling the probability of the word sequence given the input audio stream in a single neural network. However, compared to conventional ASR systems, these…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-19 Ankur Gandhe , Ariya Rastrow

Multilingual Speech Recognition With A Single End-To-End Model

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models…

Audio and Speech Processing · Electrical Eng. & Systems 2018-02-16 Shubham Toshniwal , Tara N. Sainath , Ron J. Weiss , Bo Li , Pedro Moreno , Eugene Weinstein , Kanishka Rao

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on…

Computation and Language · Computer Science 2024-01-12 Jiaxin Guo , Minghan Wang , Xiaosong Qiao , Daimeng Wei , Hengchao Shang , Zongyao Li , Zhengzhe Yu , Yinglu Li , Chang Su , Min Zhang , Shimin Tao , Hao Yang

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

Although modern automatic speech recognition (ASR) systems can achieve high performance, they may produce errors that weaken readers' experience and do harm to downstream tasks. To improve the accuracy and reliability of ASR hypotheses, we…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-11 Jing Du , Shiliang Pu , Qinbo Dong , Chao Jin , Xin Qi , Dian Gu , Ru Wu , Hongwei Zhou

Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with…

Computation and Language · Computer Science 2019-07-01 Prashanth Gurunath Shivakumar , Haoqi Li , Kevin Knight , Panayiotis Georgiou

PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction

Speech-to-text errors made by automatic speech recognition (ASR) systems negatively impact downstream models. Error correction models as a post-processing text editing method have been recently developed for refining the ASR outputs.…

Computation and Language · Computer Science 2023-06-22 Ziji Zhang , Zhehui Wang , Rajesh Kamma , Sharanya Eswaran , Narayanan Sadagopan

Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems

Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person…

Computation and Language · Computer Science 2022-09-08 Xiaoqiang Wang , Yanqing Liu , Jinyu Li , Veljko Miljanic , Sheng Zhao , Hosam Khalil

Post-Editing Error Correction Algorithm for Speech Recognition using Bing Spelling Suggestion

ASR short for Automatic Speech Recognition is the process of converting a spoken speech into text that can be manipulated by a computer. Although ASR has several applications, it is still erroneous and imprecise especially if used in a…

Computation and Language · Computer Science 2012-03-26 Youssef Bassil , Mohammad Alwani

A spelling correction model for end-to-end speech recognition

Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-20 Jinxi Guo , Tara N. Sainath , Ron J. Weiss

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

Automatic Speech Recognition (ASR) systems exhibit the best performance on speech that is similar to that on which it was trained. As such, underrepresented varieties including regional dialects, minority-speakers, and low-resource…

Computation and Language · Computer Science 2023-05-15 Emma O'Neill , Julie Carson-Berndsen

Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?

We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR). This new metric is for languages that contain half characters and where the same character can be written in different forms. We implement our…

Computation and Language · Computer Science 2022-06-16 Priyanshi Shah , Harveen Singh Chadha , Anirudh Gupta , Ankur Dhuriya , Neeraj Chhimwal , Rishabh Gaur , Vivek Raghavan

An Approach to Improve Robustness of NLP Systems against ASR Errors

Speech-enabled systems typically first convert audio to text through an automatic speech recognition (ASR) model and then feed the text to downstream natural language processing (NLP) modules. The errors of the ASR system can seriously…

Computation and Language · Computer Science 2021-03-26 Tong Cui , Jinghui Xiao , Liangyou Li , Xin Jiang , Qun Liu

Multimodal Grounding for Sequence-to-Sequence Speech Recognition

Humans are capable of processing speech by making use of multiple sensory modalities. For example, the environment where a conversation takes place generally provides semantic and/or acoustic context that helps us to resolve ambiguities or…

Computation and Language · Computer Science 2019-02-21 Ozan Caglayan , Ramon Sanabria , Shruti Palaskar , Loïc Barrault , Florian Metze

ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction

Automatic speech recognition (ASR) systems often encounter difficulties in accurately recognizing rare words, leading to errors that can have a negative impact on downstream tasks such as keyword spotting, intent detection, and text…

Artificial Intelligence · Computer Science 2023-10-10 Jiajun He , Zekun Yang , Tomoki Toda