English
Related papers

Related papers: Internal Language Model Estimation Through Explici…

200 papers

Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A…

Computation and Language · Computer Science 2021-06-18 Mohammad Zeineldeen , Aleksandr Glushko , Wilfried Michel , Albert Zeyer , Ralf Schlüter , Hermann Ney

The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models. In this work, we propose an internal LM…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Zhong Meng , Sarangarajan Parthasarathy , Eric Sun , Yashesh Gaur , Naoyuki Kanda , Liang Lu , Xie Chen , Rui Zhao , Jinyu Li , Yifan Gong

The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method. In this method, the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-04-26 Zhong Meng , Naoyuki Kanda , Yashesh Gaur , Sarangarajan Parthasarathy , Eric Sun , Liang Lu , Xie Chen , Jinyu Li , Yifan Gong

Text-only adaptation of an end-to-end (E2E) model remains a challenging task for automatic speech recognition (ASR). Language model (LM) fusion-based approaches require an additional external LM during inference, significantly increasing…

Computation and Language · Computer Science 2022-11-01 Zhong Meng , Yashesh Gaur , Naoyuki Kanda , Jinyu Li , Xie Chen , Yu Wu , Yifan Gong

With approximately 7,000 languages spoken worldwide, current large language models (LLMs) support only a small subset. Prior research indicates LLMs can learn new languages for certain tasks without supervised data. We extend this…

Computation and Language · Computer Science 2026-01-29 Zhaolin Li , Jan Niehues

The attention-based end-to-end (E2E) automatic speech recognition (ASR) architecture allows for joint optimization of acoustic and language models within a single network. However, in a vanilla E2E ASR architecture, the decoder sub-network…

Computation and Language · Computer Science 2019-12-03 Van Tung Pham , Haihua Xu , Yerbolat Khassanov , Zhiping Zeng , Eng Siong Chng , Chongjia Ni , Bin Ma , Haizhou Li

Language modeling (LM) for automatic speech recognition (ASR) does not usually incorporate utterance level contextual information. For some domains like voice assistants, however, additional context, such as the time at which an utterance…

Computation and Language · Computer Science 2021-06-04 Richard Diehl Martinez , Scott Novotney , Ivan Bulyko , Ariya Rastrow , Andreas Stolcke , Ankur Gandhe

In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text. In…

Despite the rapid progress of end-to-end (E2E) automatic speech recognition (ASR), it has been shown that incorporating external language models (LMs) into the decoding can further improve the recognition performance of E2E ASR systems. To…

Computation and Language · Computer Science 2022-04-13 Jinchuan Tian , Jianwei Yu , Chao Weng , Yuexian Zou , Dong Yu

Automatic speech recognition (ASR) systems normally consist of an acoustic model (AM) and a language model (LM). The acoustic model estimates the probability distribution of text given the input speech, while the language model calibrates…

Computation and Language · Computer Science 2025-06-17 Qingliang Meng , Pengju Ren , Tian Li , Changsong Dai , Huizhi Liang

Integrating audio encoders with LLMs through connectors has enabled these models to process and comprehend audio modalities, significantly enhancing speech-to-text tasks, including automatic speech recognition (ASR) and automatic speech…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-18 Hongfei Xue , Wei Ren , Xuelong Geng , Kun Wei , Longhao Li , Qijie Shao , Linju Yang , Kai Diao , Lei Xie

Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-17 Jinseok Park , Hyung Yong Kim , Jihwan Park , Byeong-Yeol Kim , Shukjae Choi , Yunkyu Lim

In automatic speech recognition (ASR) what a user says depends on the particular context she is in. Typically, this context is represented as a set of word n-grams. In this work, we present a novel, all-neural, end-to-end (E2E) ASR sys- tem…

Audio and Speech Processing · Electrical Eng. & Systems 2018-08-09 Golan Pundak , Tara N. Sainath , Rohit Prabhavalkar , Anjuli Kannan , Ding Zhao

We develop a large language model (LLM) based automatic speech recognition (ASR) system that can be contextualized by providing keywords as prior information in text prompts. We adopt decoder-only architecture and use our in-house LLM,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-14 Kento Nozawa , Takashi Masuko , Toru Taniguchi

Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and internal language model estimation (ILME)…

Audio and Speech Processing · Electrical Eng. & Systems 2022-08-04 Huahuan Zheng , Keyu An , Zhijian Ou , Chen Huang , Ke Ding , Guanglu Wan

The mismatch between an external language model (LM) and the implicitly learned internal LM (ILM) of RNN-Transducer (RNN-T) can limit the performance of LM integration such as simple shallow fusion. A Bayesian interpretation suggests to…

Computation and Language · Computer Science 2022-02-17 Wei Zhou , Zuoyun Zheng , Ralf Schlüter , Hermann Ney

End-to-end modeling (E2E) of automatic speech recognition (ASR) blends all the components of a traditional speech recognition system into a unified model. Although it simplifies training and decoding pipelines, the unified model is hard to…

Computation and Language · Computer Science 2018-12-06 Zhehuai Chen , Mahaveer Jain , Yongqiang Wang , Michael L. Seltzer , Christian Fuegen

Automatic speech recognition (ASR) still covers only a small fraction of the world's languages, mainly due to supervised data scarcity. In-context learning (ICL) with large language models (LLMs) addresses this problem, but prior work…

Computation and Language · Computer Science 2026-04-21 Zhaolin Li , Jan Niehues

In real-world applications, automatic speech recognition (ASR) systems must handle overlapping speech from multiple speakers and recognize rare words like technical terms. Traditional methods address multi-talker ASR and contextual biasing…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-17 Jiajun He , Naoki Sawada , Koichi Miyazaki , Tomoki Toda

We propose a novel approach to end-to-end automatic speech recognition (ASR) to achieve efficient speech in-context learning (SICL) for (i) long-form speech decoding, (ii) test-time speaker adaptation, and (iii) test-time contextual…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-01 Hao Yen , Shaoshi Ling , Guoli Ye
‹ Prev 1 2 3 10 Next ›