English
Related papers

Related papers: Multi-Dialect Speech Recognition With A Single Seq…

200 papers

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models…

Audio and Speech Processing · Electrical Eng. & Systems 2018-02-16 Shubham Toshniwal , Tara N. Sainath , Ron J. Weiss , Bo Li , Pedro Moreno , Eugene Weinstein , Kanishka Rao

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural…

Despite the success of deep learning in speech recognition, multi-dialect speech recognition remains a difficult problem. Although dialect-specific acoustic models are known to perform well in general, they are not easy to maintain when…

Machine Learning · Computer Science 2022-05-09 Sanghyun Yoo , Inchul Song , Yoshua Bengio

Recently sequence-to-sequence models have started to achieve state-of-the-art performance on standard speech recognition tasks when processing audio data in batch mode, i.e., the complete audio data is available when starting processing.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-28 Thai-Son Nguyen , Ngoc-Quan Pham , Sebastian Stueker , Alex Waibel

Recently, there has been growing interest in multi-speaker speech recognition, where the utterances of multiple speakers are recognized from their mixture. Promising techniques have been proposed for this task, but earlier works have…

Sound · Computer Science 2018-05-16 Hiroshi Seki , Takaaki Hori , Shinji Watanabe , Jonathan Le Roux , John R. Hershey

Recent research has shown that attention-based sequence-to-sequence models such as Listen, Attend, and Spell (LAS) yield comparable results to state-of-the-art ASR systems on various tasks. In this paper, we describe the development of such…

Computation and Language · Computer Science 2018-11-07 Yan Yin , Ramon Prieto , Bin Wang , Jianwei Zhou , Yiwei Gu , Yang Liu , Hui Lin

Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-20 Jinxi Guo , Tara N. Sainath , Ron J. Weiss

Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners.…

Computation and Language · Computer Science 2017-06-21 Chung-Cheng Chiu , Dieterich Lawson , Yuping Luo , George Tucker , Kevin Swersky , Ilya Sutskever , Navdeep Jaitly

We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has…

Computation and Language · Computer Science 2015-08-21 William Chan , Navdeep Jaitly , Quoc V. Le , Oriol Vinyals

Sequence-to-sequence attention-based models integrate an acoustic, pronunciation and language model into a single neural network, which make them very suitable for multilingual automatic speech recognition (ASR). In this paper, we are…

Audio and Speech Processing · Electrical Eng. & Systems 2018-06-15 Shiyu Zhou , Shuang Xu , Bo Xu

Acoustic-to-Word recognition provides a straightforward solution to end-to-end speech recognition without needing external decoding, language model re-scoring or lexicon. While character-based models offer a natural solution to the…

Audio and Speech Processing · Electrical Eng. & Systems 2018-08-22 Shruti Palaskar , Florian Metze

Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition. Using these architectures, one-dimensional input and output sequences are related by an attention approach, thereby replacing more…

Computation and Language · Computer Science 2019-11-21 Parnia Bahar , Albert Zeyer , Ralf Schlüter , Hermann Ney

Speech language models refer to language models with speech processing and understanding capabilities. One key desirable capability for speech language models is the ability to capture the intricate interdependency between content and…

Computation and Language · Computer Science 2025-08-11 Kaizhi Qian , Xulin Fan , Junrui Ni , Slava Shechtman , Mark Hasegawa-Johnson , Chuang Gan , Yang Zhang

We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another. The model does not explicitly transcribe the speech into text in the source language, nor does it…

Computation and Language · Computer Science 2017-06-13 Ron J. Weiss , Jan Chorowski , Navdeep Jaitly , Yonghui Wu , Zhifeng Chen

Integrating an external language model into a sequence-to-sequence speech recognition system is non-trivial. Previous works utilize linear interpolation or a fusion network to integrate external language models. However, these approaches…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-16 Ye Bai , Jiangyan Yi , Jianhua Tao , Zhengkun Tian , Zhengqi Wen

Neural sequence-to-sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence. In this work, we focus on one-to-many sequence transduction problems, such as…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-26 Jing Shi , Xuankai Chang , Pengcheng Guo , Shinji Watanabe , Yusuke Fujita , Jiaming Xu , Bo Xu , Lei Xie

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are…

Computation and Language · Computer Science 2018-03-08 Siddharth Dalmia , Ramon Sanabria , Florian Metze , Alan W. Black

Combining multiple machine learning models into an ensemble is known to provide superior performance levels compared to the individual components forming the ensemble. This is because models can complement each other in taking better…

Sound · Computer Science 2021-06-09 Nicolae-Catalin Ristea , Radu Tudor Ionescu

Sequence-to-sequence learning with neural networks has become the de facto standard for sequence prediction tasks. This approach typically models the local distribution over the next word with a powerful neural network that can condition on…

Computation and Language · Computer Science 2021-11-17 Yoon Kim

Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of…

‹ Prev 1 2 3 10 Next ›