Related papers: Generating Human Readable Transcript for Automatic…

Improving Readability for Automatic Speech Recognition Transcription

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other…

Computation and Language · Computer Science 2020-04-10 Junwei Liao , Sefik Emre Eskimez , Liyang Lu , Yu Shi , Ming Gong , Linjun Shou , Hong Qu , Michael Zeng

Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and…

Computation and Language · Computer Science 2019-10-24 Oleksii Hrinchuk , Mariya Popova , Boris Ginsburg

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

Although modern automatic speech recognition (ASR) systems can achieve high performance, they may produce errors that weaken readers' experience and do harm to downstream tasks. To improve the accuracy and reliability of ASR hypotheses, we…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-11 Jing Du , Shiliang Pu , Qinbo Dong , Chao Jin , Xin Qi , Dian Gu , Ru Wu , Hongwei Zhou

Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training

In a pipeline speech translation system, automatic speech recognition (ASR) system will transmit errors in recognition to the downstream machine translation (MT) system. A standard machine translation system is usually trained on parallel…

Computation and Language · Computer Science 2019-10-29 Qiao Cheng , Meiyuan Fang , Yaqian Han , Jin Huang , Yitao Duan

Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement

Speech enhancement (SE) systems are typically evaluated using a variety of instrumental metrics. The use of automatic speech recognition (ASR) systems to evaluate SE performance is common in literature, usually in terms of word error rate…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-13 Danilo de Oliveira , Tal Peer , Timo Gerkmann

Text Generation with Speech Synthesis for ASR Data Augmentation

Aiming at reducing the reliance on expensive human annotations, data synthesis for Automatic Speech Recognition (ASR) has remained an active area of research. While prior work mainly focuses on synthetic speech generation for ASR data…

Computation and Language · Computer Science 2023-05-29 Zhuangqun Huang , Gil Keren , Ziran Jiang , Shashank Jain , David Goss-Grubbs , Nelson Cheng , Farnaz Abtahi , Duc Le , David Zhang , Antony D'Avirro , Ethan Campbell-Taylor , Jessie Salas , Irina-Elena Veliche , Xi Chen

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement

With the surge of online meetings, it has become more critical than ever to provide high-quality speech audio and live captioning under various noise conditions. However, most monaural speech enhancement (SE) models introduce processing…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-08 Sefik Emre Eskimez , Xiaofei Wang , Min Tang , Hemin Yang , Zirun Zhu , Zhuo Chen , Huaming Wang , Takuya Yoshioka

Speech Enhancement Modeling Towards Robust Speech Recognition System

Form about four decades human beings have been dreaming of an intelligent machine which can master the natural speech. In its simplest form, this machine should consist of two subsystems, namely automatic speech recognition (ASR) and speech…

Sound · Computer Science 2013-05-08 Urmila Shrawankar , V. M. Thakare

Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping

Automatic Speech Recognition (ASR) systems are known to exhibit difficulties when transcribing children's speech. This can mainly be attributed to the absence of large children's speech corpora to train robust ASR models and the resulting…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-22 Jenthe Thienpondt , Kris Demuynck

Adapting End-to-End Speech Recognition for Readable Subtitles

Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time.…

Computation and Language · Computer Science 2020-05-26 Danni Liu , Jan Niehues , Gerasimos Spanakis

Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript

Recent years have witnessed significant improvement in ASR systems to recognize spoken utterances. However, it is still a challenging task for noisy and out-of-domain data, where substitution and deletion errors are prevalent in the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-17 Mukuntha Narayanan Sundararaman , Ayush Kumar , Jithendra Vepa

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

Automatic speech recognition (ASR) outcomes serve as input for downstream tasks, substantially impacting the satisfaction level of end-users. Hence, the diagnosis and enhancement of the vulnerabilities present in the ASR model bear…

Computation and Language · Computer Science 2024-01-29 Seonmin Koo , Chanjun Park , Jinsung Kim , Jaehyung Seo , Sugyeong Eo , Hyeonseok Moon , Heuiseok Lim

A Deep Learning System for Domain-specific Speech Recognition

As human-machine voice interfaces provide easy access to increasingly intelligent machines, many state-of-the-art automatic speech recognition (ASR) systems are proposed. However, commercial ASR systems usually have poor performance on…

Computation and Language · Computer Science 2023-09-28 Yanan Jia

Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for End Usability

Recent advances in supervised, semi-supervised and self-supervised deep learning algorithms have shown significant improvement in the performance of automatic speech recognition(ASR) systems. The state-of-the-art systems have achieved a…

Computation and Language · Computer Science 2021-10-19 Somnath Roy

Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with…

Computation and Language · Computer Science 2019-07-01 Prashanth Gurunath Shivakumar , Haoqi Li , Kevin Knight , Panayiotis Georgiou

An Approach to Improve Robustness of NLP Systems against ASR Errors

Speech-enabled systems typically first convert audio to text through an automatic speech recognition (ASR) model and then feed the text to downstream natural language processing (NLP) modules. The errors of the ASR system can seriously…

Computation and Language · Computer Science 2021-03-26 Tong Cui , Jinghui Xiao , Liangyou Li , Xin Jiang , Qun Liu

Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM

Automatic speech recognition (ASR) models rely on high-quality transcribed data for effective training. Generating pseudo-labels for large unlabeled audio datasets often relies on complex pipelines that combine multiple ASR outputs through…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-06 Jeena Prakash , Blessingh Kumar , Kadri Hacioglu , Bidisha Sharma , Sindhuja Gopalan , Malolan Chetlur , Shankar Venkatesan , Andreas Stolcke

Towards Improved Speech Recognition through Optimized Synthetic Data Generation

Supervised training of speech recognition models requires access to transcribed audio data, which often is not possible due to confidentiality issues. Our approach to this problem is to generate synthetic audio from a text-only corpus using…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-01 Yanis Perrin , Gilles Boulianne

Assessing Latency in ASR Systems: A Methodological Perspective for Real-Time Use

Automatic speech recognition (ASR) systems generate real-time transcriptions but often miss nuances that human interpreters capture. While ASR is useful in many contexts, interpreters-who already use ASR tools such as Dragon-add critical…

Sound · Computer Science 2025-10-15 Carlos Arriaga , Alejandro Pozo , Javier Conde , Alvaro Alonso

An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Abdul Hannan , Alessio Brutti , Shah Nawaz , Mubashir Noman