English
Related papers

Related papers: Improving Automatic Speech Recognition with Decode…

200 papers

This paper presents a simple yet effective regularization for the internal language model induced by the decoder in encoder-decoder ASR models, thereby improving robustness and generalization in both in- and out-of-domain settings. The…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-13 Alexander Polok , Santosh Kesiraju , Karel Beneš , Bolaji Yusuf , Lukáš Burget , Jan Černocký

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and…

Computation and Language · Computer Science 2019-10-24 Oleksii Hrinchuk , Mariya Popova , Boris Ginsburg

In recent research, in the domain of speech processing, large End-to-End (E2E) systems for Automatic Speech Recognition (ASR) have reported state-of-the-art performance on various benchmarks. These systems intrinsically learn how to handle…

Computation and Language · Computer Science 2023-09-06 Patrick Eickhoff , Matthias Möller , Theresa Pekarek Rosin , Johannes Twiefel , Stefan Wermter

In this paper, we investigate the benefit that off-the-shelf word embedding can bring to the sequence-to-sequence (seq-to-seq) automatic speech recognition (ASR). We first introduced the word embedding regularization by maximizing the…

Computation and Language · Computer Science 2020-02-06 Alexander H. Liu , Tzu-Wei Sung , Shun-Po Chuang , Hung-yi Lee , Lin-shan Lee

Deep biasing improves automatic speech recognition (ASR) performance by incorporating contextual phrases. However, most existing methods enhance subwords in a contextual phrase as independent units, potentially compromising contextual…

Sound · Computer Science 2025-05-30 Zhennan Lin , Kaixun Huang , Wei Ren , Linju Yang , Lei Xie

Code-switching (CS) automatic speech recognition (ASR) faces challenges due to the language confusion resulting from accents, auditory similarity, and seamless language switches. Adaptation on the pre-trained multi-lingual model has shown…

Computation and Language · Computer Science 2025-01-07 Jiahui Zhao , Hao Shi , Chenrui Cui , Tianrui Wang , Hexin Liu , Zhaoheng Ni , Lingxuan Ye , Longbiao Wang

We consider the problem of recognizing speech utterances spoken to a device which is generating a known sound waveform; for example, recognizing queries issued to a digital assistant which is generating responses to previous user inputs.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-03 Nathan Howard , Alex Park , Turaj Zakizadeh Shabestary , Alexander Gruenstein , Rohit Prabhavalkar

We present a decoder-only Conformer for automatic speech recognition (ASR) that processes speech and text in a single stack without external speech encoders or pretrained large language models (LLM). The model uses a modality-aware sparse…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-16 Jaeyoung Lee , Masato Mimura

Nowadays, attention models are one of the popular candidates for speech recognition. So far, many studies mainly focus on the encoder structure or the attention module to enhance the performance of these models. However, mostly ignore the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Tobias Watzel , Ludwig Kürzinger , Lujun Li , Gerhard Rigoll

Speech enhancement (SE) systems are typically evaluated using a variety of instrumental metrics. The use of automatic speech recognition (ASR) systems to evaluate SE performance is common in literature, usually in terms of word error rate…

Audio and Speech Processing · Electrical Eng. & Systems 2026-05-13 Danilo de Oliveira , Tal Peer , Timo Gerkmann

Speech enhancement methods are commonly believed to improve the performance of automatic speech recognition (ASR) in noisy environments. However, the effectiveness of these techniques cannot be taken for granted in the case of modern…

Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some…

Computation and Language · Computer Science 2017-06-23 Shahab Jalalvand , Matteo Negri , Daniele Falavigna , Marco Matassoni , Marco Turchi

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural…

We propose automatic speech recognition (ASR) models inspired by echo state network (ESN), in which a subset of recurrent neural networks (RNN) layers in the models are randomly initialized and untrained. Our study focuses on RNN-T and…

Computation and Language · Computer Science 2021-02-19 Harsh Shrivastava , Ankush Garg , Yuan Cao , Yu Zhang , Tara Sainath

The recent emergence of joint CTC-Attention model shows significant improvement in automatic speech recognition (ASR). The improvement largely lies in the modeling of linguistic information by decoder. The decoder joint-optimized with an…

Computation and Language · Computer Science 2022-10-27 Xulong Zhang , Jianzong Wang , Ning Cheng , Mengyuan Zhao , Zhiyong Zhang , Jing Xiao

Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers,…

Sound · Computer Science 2023-09-14 Zhengyang Chen , Bing Han , Shuai Wang , Yanmin Qian

Automatic Speech Recognition (ASR) systems frequently use a search-based decoding strategy aiming to find the best attainable transcript by considering multiple candidates. One prominent speech recognition decoding heuristic is beam search,…

Computation and Language · Computer Science 2022-12-29 Tomer Wullach , Shlomo E. Chazan

Collecting audio-text pairs is expensive; however, it is much easier to access text-only data. Unless using shallow fusion, end-to-end automatic speech recognition (ASR) models require architecture modifications or additional training…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-10 Emiru Tsunoo , Hayato Futami , Yosuke Kashiwagi , Siddhant Arora , Shinji Watanabe

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on…

Computation and Language · Computer Science 2020-03-25 Alex Bie , Bharat Venkitesh , Joao Monteiro , Md. Akmal Haidar , Mehdi Rezagholizadeh

Multilingual end-to-end automatic speech recognition models are attractive due to its simplicity in training and deployment. Recent work on large-scale training of such models has shown promising results compared to monolingual models.…

Computation and Language · Computer Science 2022-10-13 Ke Hu , Bo Li , Tara N. Sainath
‹ Prev 1 2 3 10 Next ›