Related papers: Multi-Dialect Speech Recognition With A Single Seq…

Multilingual Speech Recognition With A Single End-To-End Model

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models…

Audio and Speech Processing · Electrical Eng. & Systems 2018-02-16 Shubham Toshniwal , Tara N. Sainath , Ron J. Weiss , Bo Li , Pedro Moreno , Eugene Weinstein , Kanishka Rao

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural…

Computation and Language · Computer Science 2018-02-26 Chung-Cheng Chiu , Tara N. Sainath , Yonghui Wu , Rohit Prabhavalkar , Patrick Nguyen , Zhifeng Chen , Anjuli Kannan , Ron J. Weiss , Kanishka Rao , Ekaterina Gonina , Navdeep Jaitly , Bo Li , Jan Chorowski , Michiel Bacchiani

A Highly Adaptive Acoustic Model for Accurate Multi-Dialect Speech Recognition

Despite the success of deep learning in speech recognition, multi-dialect speech recognition remains a difficult problem. Although dialect-specific acoustic models are known to perform well in general, they are not easy to maintain when…

Machine Learning · Computer Science 2022-05-09 Sanghyun Yoo , Inchul Song , Yoshua Bengio

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

Recently sequence-to-sequence models have started to achieve state-of-the-art performance on standard speech recognition tasks when processing audio data in batch mode, i.e., the complete audio data is available when starting processing.…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-28 Thai-Son Nguyen , Ngoc-Quan Pham , Sebastian Stueker , Alex Waibel

A Purely End-to-end System for Multi-speaker Speech Recognition

Recently, there has been growing interest in multi-speaker speech recognition, where the utterances of multiple speakers are recognized from their mixture. Promising techniques have been proposed for this task, but earlier works have…

Sound · Computer Science 2018-05-16 Hiroshi Seki , Takaaki Hori , Shinji Watanabe , Jonathan Le Roux , John R. Hershey

Attention-based sequence-to-sequence model for speech recognition: development of state-of-the-art system on LibriSpeech and its application to non-native English

Recent research has shown that attention-based sequence-to-sequence models such as Listen, Attend, and Spell (LAS) yield comparable results to state-of-the-art ASR systems on various tasks. In this paper, we describe the development of such…

Computation and Language · Computer Science 2018-11-07 Yan Yin , Ramon Prieto , Bin Wang , Jianwei Zhou , Yiwei Gu , Yang Liu , Hui Lin

A spelling correction model for end-to-end speech recognition

Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language…

Audio and Speech Processing · Electrical Eng. & Systems 2019-02-20 Jinxi Guo , Tara N. Sainath , Ron J. Weiss

An online sequence-to-sequence model for noisy speech recognition

Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners.…

Computation and Language · Computer Science 2017-06-21 Chung-Cheng Chiu , Dieterich Lawson , Yuping Luo , George Tucker , Kevin Swersky , Ilya Sutskever , Navdeep Jaitly

Listen, Attend and Spell

We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has…

Computation and Language · Computer Science 2015-08-21 William Chan , Navdeep Jaitly , Quoc V. Le , Oriol Vinyals

Multilingual End-to-End Speech Recognition with A Single Transformer on Low-Resource Languages

Sequence-to-sequence attention-based models integrate an acoustic, pronunciation and language model into a single neural network, which make them very suitable for multilingual automatic speech recognition (ASR). In this paper, we are…

Audio and Speech Processing · Electrical Eng. & Systems 2018-06-15 Shiyu Zhou , Shuang Xu , Bo Xu

Acoustic-to-Word Recognition with Sequence-to-Sequence Models

Acoustic-to-Word recognition provides a straightforward solution to end-to-end speech recognition without needing external decoding, language model re-scoring or lexicon. While character-based models offer a natural solution to the…

Audio and Speech Processing · Electrical Eng. & Systems 2018-08-22 Shruti Palaskar , Florian Metze

On using 2D sequence-to-sequence models for speech recognition

Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition. Using these architectures, one-dimensional input and output sequences are related by an attention approach, thereby replacing more…

Computation and Language · Computer Science 2019-11-21 Parnia Bahar , Albert Zeyer , Ralf Schlüter , Hermann Ney

ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models

Speech language models refer to language models with speech processing and understanding capabilities. One key desirable capability for speech language models is the ability to capture the intricate interdependency between content and…

Computation and Language · Computer Science 2025-08-11 Kaizhi Qian , Xulin Fan , Junrui Ni , Slava Shechtman , Mark Hasegawa-Johnson , Chuang Gan , Yang Zhang

Sequence-to-Sequence Models Can Directly Translate Foreign Speech

We present a recurrent encoder-decoder deep neural network architecture that directly translates speech in one language into text in another. The model does not explicitly transcribe the speech into text in the source language, nor does it…

Computation and Language · Computer Science 2017-06-13 Ron J. Weiss , Jan Chorowski , Navdeep Jaitly , Yonghui Wu , Zhifeng Chen

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition

Integrating an external language model into a sequence-to-sequence speech recognition system is non-trivial. Previous works utilize linear interpolation or a fusion network to integrate external language models. However, these approaches…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-16 Ye Bai , Jiangyan Yi , Jianhua Tao , Zhengkun Tian , Zhengqi Wen

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

Neural sequence-to-sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence. In this work, we focus on one-to-many sequence transduction problems, such as…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-26 Jing Shi , Xuankai Chang , Pengcheng Guo , Shinji Watanabe , Yusuke Fujita , Jiaming Xu , Bo Xu , Lei Xie

Sequence-based Multi-lingual Low Resource Speech Recognition

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are…

Computation and Language · Computer Science 2018-03-08 Siddharth Dalmia , Ramon Sanabria , Florian Metze , Alan W. Black

Self-paced ensemble learning for speech and audio classification

Combining multiple machine learning models into an ensemble is known to provide superior performance levels compared to the individual components forming the ensemble. This is because models can complement each other in taking better…

Sound · Computer Science 2021-06-09 Nicolae-Catalin Ristea , Radu Tudor Ionescu

Sequence-to-Sequence Learning with Latent Neural Grammars

Sequence-to-sequence learning with neural networks has become the de facto standard for sequence prediction tasks. This approach typically models the local distribution over the next word with a powerful neural network that can condition on…

Computation and Language · Computer Science 2021-11-17 Yoon Kim

Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of…

Computation and Language · Computer Science 2018-10-09 Jaejin Cho , Murali Karthick Baskar , Ruizhi Li , Matthew Wiesner , Sri Harish Mallidi , Nelson Yalta , Martin Karafiat , Shinji Watanabe , Takaaki Hori