Related papers: CPPF: A contextual and post-processing-free model …

Improving Readability for Automatic Speech Recognition Transcription

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other…

Computation and Language · Computer Science 2020-04-10 Junwei Liao , Sefik Emre Eskimez , Liyang Lu , Yu Shi , Ming Gong , Linjun Shou , Hong Qu , Michael Zeng

Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection

Although modern automatic speech recognition (ASR) systems can achieve high performance, they may produce errors that weaken readers' experience and do harm to downstream tasks. To improve the accuracy and reliability of ASR hypotheses, we…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-11 Jing Du , Shiliang Pu , Qinbo Dong , Chao Jin , Xin Qi , Dian Gu , Ru Wu , Hongwei Zhou

Speech LLMs are Contextual Reasoning Transcribers

Despite extensions to speech inputs, effectively leveraging the rich knowledge and contextual understanding of large language models (LLMs) in automatic speech recognition (ASR) remains non-trivial, as the task primarily involves direct…

Computation and Language · Computer Science 2026-04-02 Keqi Deng , Ruchao Fan , Bo Ren , Yiming Wang , Jinyu Li

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata…

Computation and Language · Computer Science 2021-02-23 Junwei Liao , Yu Shi , Ming Gong , Linjun Shou , Sefik Eskimez , Liyang Lu , Hong Qu , Michael Zeng

Attention-based Contextual Language Model Adaptation for Speech Recognition

Language modeling (LM) for automatic speech recognition (ASR) does not usually incorporate utterance level contextual information. For some domains like voice assistants, however, additional context, such as the time at which an utterance…

Computation and Language · Computer Science 2021-06-04 Richard Diehl Martinez , Scott Novotney , Ivan Bulyko , Ariya Rastrow , Andreas Stolcke , Ankur Gandhe

Improving Distinction between ASR Errors and Speech Disfluencies with Feature Space Interpolation

Fine-tuning pretrained language models (LMs) is a popular approach to automatic speech recognition (ASR) error detection during post-processing. While error detection systems often take advantage of statistical language archetypes captured…

Computation and Language · Computer Science 2021-08-05 Seongmin Park , Dongchan Shin , Sangyoun Paik , Subong Choi , Alena Kazakova , Jihwa Lee

Conversational Speech Recognition By Learning Conversation-level Characteristics

Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers. Unlike sentence-level ASR, conversational ASR can naturally take advantages from specific characteristics of…

Sound · Computer Science 2022-02-18 Kun Wei , Yike Zhang , Sining Sun , Lei Xie , Long Ma

An Effective Training Framework for Light-Weight Automatic Speech Recognition Models

Recent advancement in deep learning encouraged developing large automatic speech recognition (ASR) models that achieve promising results while ignoring computational and memory constraints. However, deploying such models on low resource…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Abdul Hannan , Alessio Brutti , Shah Nawaz , Mubashir Noman

CMT-LLM: Contextual Multi-Talker ASR Utilizing Large Language Models

In real-world applications, automatic speech recognition (ASR) systems must handle overlapping speech from multiple speakers and recognize rare words like technical terms. Traditional methods address multi-talker ASR and contextual biasing…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-17 Jiajun He , Naoki Sawada , Koichi Miyazaki , Tomoki Toda

Towards Unsupervised Speech Recognition Without Pronunciation Models

Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech…

Computation and Language · Computer Science 2025-01-10 Junrui Ni , Liming Wang , Yang Zhang , Kaizhi Qian , Heting Gao , Mark Hasegawa-Johnson , Chang D. Yoo

Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM

Automatic speech recognition (ASR) models rely on high-quality transcribed data for effective training. Generating pseudo-labels for large unlabeled audio datasets often relies on complex pipelines that combine multiple ASR outputs through…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-06 Jeena Prakash , Blessingh Kumar , Kadri Hacioglu , Bidisha Sharma , Sindhuja Gopalan , Malolan Chetlur , Shankar Venkatesan , Andreas Stolcke

Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and…

Computation and Language · Computer Science 2019-10-24 Oleksii Hrinchuk , Mariya Popova , Boris Ginsburg

Exploring the Integration of Large Language Models into Automatic Speech Recognition Systems: An Empirical Study

This paper explores the integration of Large Language Models (LLMs) into Automatic Speech Recognition (ASR) systems to improve transcription accuracy. The increasing sophistication of LLMs, with their in-context learning capabilities and…

Computation and Language · Computer Science 2025-06-03 Zeping Min , Jinbo Wang

App for Resume-Based Job Matching with Speech Interviews and Grammar Analysis: A Review

Through the advancement in natural language processing (NLP), specifically in speech recognition, fully automated complex systems functioning on voice input have started proliferating in areas such as home automation. These systems have…

Computation and Language · Computer Science 2023-11-28 Tanmay Kulkarni , Yuvraj Pardeshi , Yash Shah , Vaishnvi Sakat , Sapana Bhirud

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR

Speech foundation models have achieved state-of-the-art (SoTA) performance across various tasks, such as automatic speech recognition (ASR) in hundreds of languages. However, multi-speaker ASR remains a challenging task for these models due…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-04 Weiqing Wang , Kunal Dhawan , Taejin Park , Krishna C. Puvvada , Ivan Medennikov , Somshubra Majumdar , He Huang , Jagadeesh Balam , Boris Ginsburg

ASDF: A Differential Testing Framework for Automatic Speech Recognition Systems

Recent years have witnessed wider adoption of Automated Speech Recognition (ASR) techniques in various domains. Consequently, evaluating and enhancing the quality of ASR systems is of great importance. This paper proposes ASDF, an Automated…

Audio and Speech Processing · Electrical Eng. & Systems 2023-02-14 Daniel Hao Xian Yuen , Andrew Yong Chen Pang , Zhou Yang , Chun Yong Chong , Mei Kuan Lim , David Lo

Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages

Automatic speech recognition systems have undoubtedly advanced with the integration of multilingual and multitask models such as Whisper, which have shown a promising ability to understand and process speech across a wide range of…

Computation and Language · Computer Science 2025-04-14 Xabier de Zuazo , Eva Navas , Ibon Saratxaga , Inma Hernáez Rioja

Speech Recognition by Simply Fine-tuning BERT

We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual representations. Our assumption is that given…

Sound · Computer Science 2021-02-02 Wen-Chin Huang , Chia-Hua Wu , Shang-Bao Luo , Kuan-Yu Chen , Hsin-Min Wang , Tomoki Toda

Audio-visual Multi-channel Recognition of Overlapped Speech

Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date. To this end, multi-channel microphone array data are widely used in state-of-the-art ASR systems. Motivated by the invariance of visual…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-19 Jianwei Yu , Bo Wu , Rongzhi Gu , Shi-Xiong Zhang , Lianwu Chen , Yong Xu. Meng Yu , Dan Su , Dong Yu , Xunying Liu , Helen Meng

Learning ASR-Robust Contextualized Embeddings for Spoken Language Understanding

Employing pre-trained language models (LM) to extract contextualized word representations has achieved state-of-the-art performance on various NLP tasks. However, applying this technique to noisy transcripts generated by automatic speech…

Computation and Language · Computer Science 2020-11-03 Chao-Wei Huang , Yun-Nung Chen