Related papers: Cross-Modal ASR Post-Processing System for Error C…

ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction

Automatic speech recognition (ASR) systems often encounter difficulties in accurately recognizing rare words, leading to errors that can have a negative impact on downstream tasks such as keyword spotting, intent detection, and text…

Artificial Intelligence · Computer Science 2023-10-10 Jiajun He , Zekun Yang , Tomoki Toda

Tag and correct: high precision post-editing approach to correction of speech recognition errors

This paper presents a new approach to the problem of correcting speech recognition errors by means of post-editing. It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis…

Computation and Language · Computer Science 2024-06-13 Tomasz Ziętkiewicz

Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and…

Computation and Language · Computer Science 2019-10-24 Oleksii Hrinchuk , Mariya Popova , Boris Ginsburg

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

Recent advances in machine learning have demonstrated that multi-modal pre-training can improve automatic speech recognition (ASR) performance compared to randomly initialized models, even when models are fine-tuned on uni-modal tasks.…

Computation and Language · Computer Science 2024-04-01 Yash Jain , David Chan , Pranav Dheram , Aparna Khare , Olabanji Shonibare , Venkatesh Ravichandran , Shalini Ghosh

Improving Readability for Automatic Speech Recognition Transcription

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other…

Computation and Language · Computer Science 2020-04-10 Junwei Liao , Sefik Emre Eskimez , Liyang Lu , Yu Shi , Ming Gong , Linjun Shou , Hong Qu , Michael Zeng

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata…

Computation and Language · Computer Science 2021-02-23 Junwei Liao , Yu Shi , Ming Gong , Linjun Shou , Sefik Eskimez , Liyang Lu , Hong Qu , Michael Zeng

Mixture Encoder for Joint Speech Separation and Recognition

Multi-speaker automatic speech recognition (ASR) is crucial for many real-world applications, but it requires dedicated modeling techniques. Existing approaches can be divided into modular and end-to-end methods. Modular approaches separate…

Computation and Language · Computer Science 2023-06-22 Simon Berger , Peter Vieting , Christoph Boeddeker , Ralf Schlüter , Reinhold Haeb-Umbach

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition

We propose a cross-modal transformer-based neural correction models that refines the output of an automatic speech recognition (ASR) system so as to exclude ASR errors. Generally, neural correction models are composed of encoder-decoder…

Computation and Language · Computer Science 2021-07-06 Tomohiro Tanaka , Ryo Masumura , Mana Ihori , Akihiko Takashima , Takafumi Moriya , Takanori Ashihara , Shota Orihashi , Naoki Makishima

Multimodal Grounding for Sequence-to-Sequence Speech Recognition

Humans are capable of processing speech by making use of multiple sensory modalities. For example, the environment where a conversation takes place generally provides semantic and/or acoustic context that helps us to resolve ambiguities or…

Computation and Language · Computer Science 2019-02-21 Ozan Caglayan , Ramon Sanabria , Shruti Palaskar , Loïc Barrault , Florian Metze

A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation

We present a frontend for improving robustness of automatic speech recognition (ASR), that jointly implements three modules within a single model: acoustic echo cancellation, speech enhancement, and speech separation. This is achieved by…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-22 Tom O'Malley , Arun Narayanan , Quan Wang , Alex Park , James Walker , Nathan Howard

Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition

Accurately finding the wrong words in the automatic speech recognition (ASR) hypothesis and recovering them well-founded is the goal of speech error correction. In this paper, we propose a non-autoregressive speech error correction method.…

Computation and Language · Computer Science 2024-07-19 Yuchun Shu , Bo Hu , Yifeng He , Hao Shi , Longbiao Wang , Jianwu Dang

Improving Distinction between ASR Errors and Speech Disfluencies with Feature Space Interpolation

Fine-tuning pretrained language models (LMs) is a popular approach to automatic speech recognition (ASR) error detection during post-processing. While error detection systems often take advantage of statistical language archetypes captured…

Computation and Language · Computer Science 2021-08-05 Seongmin Park , Dongchan Shin , Sangyoun Paik , Subong Choi , Alena Kazakova , Jihwa Lee

Audio-attention discriminative language model for ASR rescoring

End-to-end approaches for automatic speech recognition (ASR) benefit from directly modeling the probability of the word sequence given the input audio stream in a single neural network. However, compared to conventional ASR systems, these…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-19 Ankur Gandhe , Ariya Rastrow

Error Correction in ASR using Sequence-to-Sequence Models

Post-editing in Automatic Speech Recognition (ASR) entails automatically correcting common and systematic errors produced by the ASR system. The outputs of an ASR system are largely prone to phonetic and spelling errors. In this paper, we…

Computation and Language · Computer Science 2022-08-24 Samrat Dutta , Shreyansh Jain , Ayush Maheshwari , Souvik Pal , Ganesh Ramakrishnan , Preethi Jyothi

PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction

Speech-to-text errors made by automatic speech recognition (ASR) systems negatively impact downstream models. Error correction models as a post-processing text editing method have been recently developed for refining the ASR outputs.…

Computation and Language · Computer Science 2023-06-22 Ziji Zhang , Zhehui Wang , Rajesh Kamma , Sharanya Eswaran , Narayanan Sadagopan

Hybrid phonetic-neural model for correction in speech recognition systems

Automatic speech recognition (ASR) is a relevant area in multiple settings because it provides a natural communication mechanism between applications and users. ASRs often fail in environments that use language specific to particular…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-16 Rafael Viana-Cámara , Mario Campos-Soberanis , Diego Campos-Sobrino

Breaking the Data Barrier: Towards Robust Speech Translation via Adversarial Stability Training

In a pipeline speech translation system, automatic speech recognition (ASR) system will transmit errors in recognition to the downstream machine translation (MT) system. A standard machine translation system is usually trained on parallel…

Computation and Language · Computer Science 2019-10-29 Qiao Cheng , Meiyuan Fang , Yaqian Han , Jin Huang , Yitao Duan

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-06 Jian Wu , Zhuo Chen , Sanyuan Chen , Yu Wu , Takuya Yoshioka , Naoyuki Kanda , Shujie Liu , Jinyu Li

Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with…

Computation and Language · Computer Science 2019-07-01 Prashanth Gurunath Shivakumar , Haoqi Li , Kevin Knight , Panayiotis Georgiou

Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement

Speech enhancement (SE) systems are typically evaluated using a variety of instrumental metrics. The use of automatic speech recognition (ASR) systems to evaluate SE performance is common in literature, usually in terms of word error rate…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-13 Danilo de Oliveira , Tal Peer , Timo Gerkmann