English
Related papers

Related papers: Error Correction in ASR using Sequence-to-Sequence…

200 papers

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and…

Computation and Language · Computer Science 2019-10-24 Oleksii Hrinchuk , Mariya Popova , Boris Ginsburg

This paper presents a new approach to the problem of correcting speech recognition errors by means of post-editing. It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis…

Computation and Language · Computer Science 2024-06-13 Tomasz Ziętkiewicz

Automatic speech recognition (ASR) is a relevant area in multiple settings because it provides a natural communication mechanism between applications and users. ASRs often fail in environments that use language specific to particular…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Rafael Viana-Cámara , Mario Campos-Soberanis , Diego Campos-Sobrino

Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER) than original ASR outputs. Previous works usually use a sequence-to-sequence…

Computation and Language · Computer Science 2022-11-30 Yichong Leng , Xu Tan , Linchen Zhu , Jin Xu , Renqian Luo , Linquan Liu , Tao Qin , Xiang-Yang Li , Ed Lin , Tie-Yan Liu

Although automatic speech recognition (ASR) systems achieved significantly improvements in recent years, spoken language recognition error occurs which can be easily spotted by human beings. Various language modeling techniques have been…

Computation and Language · Computer Science 2021-12-21 Yun Zhao , Xuerui Yang , Jinchao Wang , Yongyu Gao , Chao Yan , Yuanfu Zhou

Sequence-to-sequence models, such as attention-based models in automatic speech recognition (ASR), are typically trained to optimize the cross-entropy criterion which corresponds to improving the log-likelihood of the data. However, system…

Computation and Language · Computer Science 2017-12-06 Rohit Prabhavalkar , Tara N. Sainath , Yonghui Wu , Patrick Nguyen , Zhifeng Chen , Chung-Cheng Chiu , Anjuli Kannan

End-to-end approaches for automatic speech recognition (ASR) benefit from directly modeling the probability of the word sequence given the input audio stream in a single neural network. However, compared to conventional ASR systems, these…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-19 Ankur Gandhe , Ariya Rastrow

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models…

Audio and Speech Processing · Electrical Eng. & Systems 2018-02-16 Shubham Toshniwal , Tara N. Sainath , Ron J. Weiss , Bo Li , Pedro Moreno , Eugene Weinstein , Kanishka Rao

Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on…

Computation and Language · Computer Science 2024-01-12 Jiaxin Guo , Minghan Wang , Xiaosong Qiao , Daimeng Wei , Hengchao Shang , Zongyao Li , Zhengzhe Yu , Yinglu Li , Chang Su , Min Zhang , Shimin Tao , Hao Yang

Although modern automatic speech recognition (ASR) systems can achieve high performance, they may produce errors that weaken readers' experience and do harm to downstream tasks. To improve the accuracy and reliability of ASR hypotheses, we…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-11 Jing Du , Shiliang Pu , Qinbo Dong , Chao Jin , Xin Qi , Dian Gu , Ru Wu , Hongwei Zhou

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with…

Computation and Language · Computer Science 2019-07-01 Prashanth Gurunath Shivakumar , Haoqi Li , Kevin Knight , Panayiotis Georgiou

Speech-to-text errors made by automatic speech recognition (ASR) systems negatively impact downstream models. Error correction models as a post-processing text editing method have been recently developed for refining the ASR outputs.…

Computation and Language · Computer Science 2023-06-22 Ziji Zhang , Zhehui Wang , Rajesh Kamma , Sharanya Eswaran , Narayanan Sadagopan

Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person…

Computation and Language · Computer Science 2022-09-08 Xiaoqiang Wang , Yanqing Liu , Jinyu Li , Veljko Miljanic , Sheng Zhao , Hosam Khalil

ASR short for Automatic Speech Recognition is the process of converting a spoken speech into text that can be manipulated by a computer. Although ASR has several applications, it is still erroneous and imprecise especially if used in a…

Computation and Language · Computer Science 2012-03-26 Youssef Bassil , Mohammad Alwani

Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-20 Jinxi Guo , Tara N. Sainath , Ron J. Weiss

Automatic Speech Recognition (ASR) systems exhibit the best performance on speech that is similar to that on which it was trained. As such, underrepresented varieties including regional dialects, minority-speakers, and low-resource…

Computation and Language · Computer Science 2023-05-15 Emma O'Neill , Julie Carson-Berndsen

We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR). This new metric is for languages that contain half characters and where the same character can be written in different forms. We implement our…

Computation and Language · Computer Science 2022-06-16 Priyanshi Shah , Harveen Singh Chadha , Anirudh Gupta , Ankur Dhuriya , Neeraj Chhimwal , Rishabh Gaur , Vivek Raghavan

Speech-enabled systems typically first convert audio to text through an automatic speech recognition (ASR) model and then feed the text to downstream natural language processing (NLP) modules. The errors of the ASR system can seriously…

Computation and Language · Computer Science 2021-03-26 Tong Cui , Jinghui Xiao , Liangyou Li , Xin Jiang , Qun Liu

Humans are capable of processing speech by making use of multiple sensory modalities. For example, the environment where a conversation takes place generally provides semantic and/or acoustic context that helps us to resolve ambiguities or…

Computation and Language · Computer Science 2019-02-21 Ozan Caglayan , Ramon Sanabria , Shruti Palaskar , Loïc Barrault , Florian Metze

Automatic speech recognition (ASR) systems often encounter difficulties in accurately recognizing rare words, leading to errors that can have a negative impact on downstream tasks such as keyword spotting, intent detection, and text…

Artificial Intelligence · Computer Science 2023-10-10 Jiajun He , Zekun Yang , Tomoki Toda
‹ Prev 1 2 3 10 Next ›