Related papers: Prioritizing Speech Test Cases

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts. However, SA-ASR poses unique challenges…

Audio and Speech Processing · Electrical Eng. & Systems 2023-09-29 Xiang Lyu , Yuhang Cao , Qing Wang , Jingjing Yin , Yuguang Yang , Pengpeng Zou , Yanni Hu , Heng Lu

Error-driven Fixed-Budget ASR Personalization for Accented Speakers

We consider the task of personalizing ASR models while being constrained by a fixed budget on recording speaker-specific utterances. Given a speaker and an ASR model, we propose a method of identifying sentences for which the speaker's…

Sound · Computer Science 2021-06-03 Abhijeet Awasthi , Aman Kansal , Sunita Sarawagi , Preethi Jyothi

CrossASR++: A Modular Differential Testing Framework for Automatic Speech Recognition

Developers need to perform adequate testing to ensure the quality of Automatic Speech Recognition (ASR) systems. However, manually collecting required test cases is tedious and time-consuming. Our recent work proposes CrossASR, a…

Software Engineering · Computer Science 2022-01-06 Muhammad Hilmi Asyrofi , Zhou Yang , David Lo

Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-supervised Learning

Recently, self-supervised pre-training has gained success in automatic speech recognition (ASR). However, considering the difference between speech accents in real scenarios, how to identify accents and use accent features to improve ASR is…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-16 Keqi Deng , Songjun Cao , Long Ma

The RoyalFlush System of Speech Recognition for M2MeT Challenge

This paper describes our RoyalFlush system for the track of multi-speaker automatic speech recognition (ASR) in the M2MeT challenge. We adopted the serialized output training (SOT) based multi-speakers ASR system with large-scale simulation…

Sound · Computer Science 2022-02-25 Shuaishuai Ye , Peiyao Wang , Shunfei Chen , Xinhui Hu , Xinkang Xu

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR…

Machine Learning · Computer Science 2025-05-30 Griffin Dietz Smith , Dianna Yee , Jennifer King Chen , Leah Findlater

Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript

Recent years have witnessed significant improvement in ASR systems to recognize spoken utterances. However, it is still a challenging task for noisy and out-of-domain data, where substitution and deletion errors are prevalent in the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-17 Mukuntha Narayanan Sundararaman , Ayush Kumar , Jithendra Vepa

HypR: A comprehensive study for ASR hypothesis revising with a reference corpus

With the development of deep learning, automatic speech recognition (ASR) has made significant progress. To further enhance the performance of ASR, revising recognition results is one of the lightweight but efficient manners. Various…

Computation and Language · Computer Science 2024-06-14 Yi-Wei Wang , Ke-Han Lu , Kuan-Yu Chen

Personalized Predictive ASR for Latency Reduction in Voice Assistants

Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize prefetching to partially hide the latency of response generation. Prefetching involves passing a preliminary ASR hypothesis to downstream systems in order to…

Computation and Language · Computer Science 2023-05-24 Andreas Schwarz , Di He , Maarten Van Segbroeck , Mohammed Hethnawi , Ariya Rastrow

Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers

Modeling the errors of a speech recognizer can help simulate errorful recognized speech data from plain text, which has proven useful for tasks like discriminative language modeling, improving robustness of NLP systems, where limited or…

Artificial Intelligence · Computer Science 2024-08-22 Prashant Serai , Peidong Wang , Eric Fosler-Lussier

Transcribing Children's Speech: ASR Performance and Obtaining Reliable Orthographic Transcriptions

Automatic speech recognition (ASR) has the potential to substantially reduce manual annotation effort in child speech research by generating automatic transcriptions. However, obtaining reliably high-quality ASR transcriptions for child…

Computation and Language · Computer Science 2026-05-29 Gus Lathouwers , Lingyun Gao , Catia Cucchiarini , Helmer Strik

ASR Context-Sensitive Error Correction Based on Microsoft N-Gram Dataset

At the present time, computers are employed to solve complex tasks and problems ranging from simple calculations to intensive digital image processing and intricate algorithmic optimization problems to computationally-demanding weather…

Computation and Language · Computer Science 2012-03-26 Youssef Bassil , Paul Semaan

Prediction of speech intelligibility with DNN-based performance measures

This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these…

Sound · Computer Science 2022-03-18 Angel Mario Castro Martinez , Constantin Spille , Jana Roßbach , Birger Kollmeier , Bernd T. Meyer

Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline

Automatic speech recognition (ASR) outcomes serve as input for downstream tasks, substantially impacting the satisfaction level of end-users. Hence, the diagnosis and enhancement of the vulnerabilities present in the ASR model bear…

Computation and Language · Computer Science 2024-01-29 Seonmin Koo , Chanjun Park , Jinsung Kim , Jaehyung Seo , Sugyeong Eo , Hyeonseok Moon , Heuiseok Lim

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition. It usually requires a codebook obtained in an unsupervised way, making it less accurate and difficult to…

Computation and Language · Computer Science 2022-06-22 Chengyi Wang , Yiming Wang , Yu Wu , Sanyuan Chen , Jinyu Li , Shujie Liu , Furu Wei

Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases

Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity of speech impairment. Classification approaches can also help identify hard-to-recognize speech samples to teach ASR…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-09 Subhashini Venugopalan , Joel Shor , Manoj Plakal , Jimmy Tobin , Katrin Tomanek , Jordan R. Green , Michael P. Brenner

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible. However, emitting fast without degrading quality, as measured by word error rate (WER), is highly challenging. Existing…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-05 Jiahui Yu , Chung-Cheng Chiu , Bo Li , Shuo-yiin Chang , Tara N. Sainath , Yanzhang He , Arun Narayanan , Wei Han , Anmol Gulati , Yonghui Wu , Ruoming Pang

Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling

Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example pruning words due to acoustics using short-term context, prior to rescoring with…

Computation and Language · Computer Science 2019-07-01 Prashanth Gurunath Shivakumar , Haoqi Li , Kevin Knight , Panayiotis Georgiou

An Approach to Improve Robustness of NLP Systems against ASR Errors

Speech-enabled systems typically first convert audio to text through an automatic speech recognition (ASR) model and then feed the text to downstream natural language processing (NLP) modules. The errors of the ASR system can seriously…

Computation and Language · Computer Science 2021-03-26 Tong Cui , Jinghui Xiao , Liangyou Li , Xin Jiang , Qun Liu

Benchmarking Automatic Speech Recognition Models for African Languages

Automatic speech recognition (ASR) for African languages remains constrained by limited labeled data and the lack of systematic guidance on model selection, data scaling, and decoding strategies. Large pre-trained systems such as Whisper,…

Computation and Language · Computer Science 2025-12-15 Alvin Nahabwe , Sulaiman Kagumire , Denis Musinguzi , Bruno Beijuka , Jonah Mubuuke Kyagaba , Peter Nabende , Andrew Katumba , Joyce Nakatumba-Nabende