Related papers: Improving Automatic Speech Recognition with Decode…

DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition

This paper presents a simple yet effective regularization for the internal language model induced by the decoder in encoder-decoder ASR models, thereby improving robustness and generalization in both in- and out-of-domain settings. The…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-13 Alexander Polok , Santosh Kesiraju , Karel Beneš , Bolaji Yusuf , Lukáš Burget , Jan Černocký

Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and…

Computation and Language · Computer Science 2019-10-24 Oleksii Hrinchuk , Mariya Popova , Boris Ginsburg

Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition

In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle…

Computation and Language · Computer Science 2023-09-06 Patrick Eickhoff , Matthias Möller , Theresa Pekarek Rosin , Johannes Twiefel , Stefan Wermter

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

In this paper, we investigate the benefit that off-the-shelf word embedding can bring to the sequence-to-sequence (seq-to-seq) automatic speech recognition (ASR). We first introduced the word embedding regularization by maximizing the…

Computation and Language · Computer Science 2020-02-06 Alexander H. Liu , Tzu-Wei Sung , Shun-Po Chuang , Hung-yi Lee , Lin-shan Lee

Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation

Deep biasing improves automatic speech recognition (ASR) performance by incorporating contextual phrases. However, most existing methods enhance subwords in a contextual phrase as independent units, potentially compromising contextual…

Sound · Computer Science 2025-05-30 Zhennan Lin , Kaixun Huang , Wei Ren , Linju Yang , Lei Xie

Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding

Code-switching (CS) automatic speech recognition (ASR) faces challenges due to the language confusion resulting from accents, auditory similarity, and seamless language switches. Adaptation on the pre-trained multi-lingual model has shown…

Computation and Language · Computer Science 2025-01-07 Jiahui Zhao , Hao Shi , Chenrui Cui , Tianrui Wang , Hexin Liu , Zhaoheng Ni , Lingxuan Ye , Longbiao Wang

A Neural Acoustic Echo Canceller Optimized Using An Automatic Speech Recognizer And Large Scale Synthetic Data

We consider the problem of recognizing speech utterances spoken to a device which is generating a known sound waveform; for example, recognizing queries issued to a digital assistant which is generating responses to previous user inputs.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-03 Nathan Howard , Alex Park , Turaj Zakizadeh Shabestary , Alexander Gruenstein , Rohit Prabhavalkar

Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR

We present a decoder-only Conformer for automatic speech recognition (ASR) that processes speech and text in a single stack without external speech encoders or pretrained large language models (LLM). The model uses a modality-aware sparse…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-16 Jaeyoung Lee , Masato Mimura

Regularized Forward-Backward Decoder for Attention Models

Nowadays, attention models are one of the popular candidates for speech recognition. So far, many studies mainly focus on the encoder structure or the attention module to enhance the performance of these models. However, mostly ignore the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Tobias Watzel , Ludwig Kürzinger , Lujun Li , Gerhard Rigoll

Too Good to Be True: A Study on Modern Automatic Speech Recognition for the Evaluation of Speech Enhancement

Speech enhancement (SE) systems are typically evaluated using a variety of instrumental metrics. The use of automatic speech recognition (ASR) systems to evaluate SE performance is common in literature, usually in terms of word error rate…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-13 Danilo de Oliveira , Tal Peer , Timo Gerkmann

When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems

Speech enhancement methods are commonly believed to improve the performance of automatic speech recognition (ASR) in noisy environments. However, the effectiveness of these techniques cannot be taken for granted in the case of modern…

Sound · Computer Science 2025-12-22 Sujal Chondhekar , Vasanth Murukuri , Rushabh Vasani , Sanika Goyal , Rajshree Badami , Anushree Rana , Sanjana SN , Karthik Pandia , Sulabh Katiyar , Neha Jagadeesh , Sankalp Gulati

Automatic Quality Estimation for ASR System Combination

Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some…

Computation and Language · Computer Science 2017-06-23 Shahab Jalalvand , Matteo Negri , Daniele Falavigna , Marco Matassoni , Marco Turchi

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural…

Computation and Language · Computer Science 2018-02-26 Chung-Cheng Chiu , Tara N. Sainath , Yonghui Wu , Rohit Prabhavalkar , Patrick Nguyen , Zhifeng Chen , Anjuli Kannan , Ron J. Weiss , Kanishka Rao , Ekaterina Gonina , Navdeep Jaitly , Bo Li , Jan Chorowski , Michiel Bacchiani

Echo State Speech Recognition

We propose automatic speech recognition (ASR) models inspired by echo state network (ESN), in which a subset of recurrent neural networks (RNN) layers in the models are randomly initialized and untrained. Our study focuses on RNN-T and…

Computation and Language · Computer Science 2021-02-19 Harsh Shrivastava , Ankush Garg , Yuan Cao , Yu Zhang , Tara Sainath

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

The recent emergence of joint CTC-Attention model shows significant improvement in automatic speech recognition (ASR). The improvement largely lies in the modeling of linguistic information by decoder. The decoder joint-optimized with an…

Computation and Language · Computer Science 2022-10-27 Xulong Zhang , Jianzong Wang , Ning Cheng , Mengyuan Zhao , Zhiyong Zhang , Jing Xiao

Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer

Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers,…

Sound · Computer Science 2023-09-14 Zhengyang Chen , Bing Han , Shuai Wang , Yanmin Qian

Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation

Automatic Speech Recognition (ASR) systems frequently use a search-based decoding strategy aiming to find the best attainable transcript by considering multiple candidates. One prominent speech recognition decoding heuristic is beam search,…

Computation and Language · Computer Science 2022-12-29 Tomer Wullach , Shlomo E. Chazan

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

Collecting audio-text pairs is expensive; however, it is much easier to access text-only data. Unless using shallow fusion, end-to-end automatic speech recognition (ASR) models require architecture modifications or additional training…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-10 Emiru Tsunoo , Hayato Futami , Yosuke Kashiwagi , Siddhant Arora , Shinji Watanabe

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on…

Computation and Language · Computer Science 2020-03-25 Alex Bie , Bharat Venkitesh , Joao Monteiro , Md. Akmal Haidar , Mehdi Rezagholizadeh

Scaling Up Deliberation for Multilingual ASR

Multilingual end-to-end automatic speech recognition models are attractive due to its simplicity in training and deployment. Recent work on large-scale training of such models has shown promising results compared to monolingual models.…

Computation and Language · Computer Science 2022-10-13 Ke Hu , Bo Li , Tara N. Sainath