English
Related papers

Related papers: High Performance Sequence-to-Sequence Model for St…

200 papers

Attention-based sequence-to-sequence models have shown promising results in automatic speech recognition. Using these architectures, one-dimensional input and output sequences are related by an attention approach, thereby replacing more…

Computation and Language · Computer Science 2019-11-21 Parnia Bahar , Albert Zeyer , Ralf Schlüter , Hermann Ney

Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition. One major challenge of attention-based models is the need of…

Computation and Language · Computer Science 2020-11-17 Ching-Feng Yeh , Yongqiang Wang , Yangyang Shi , Chunyang Wu , Frank Zhang , Julian Chan , Michael L. Seltzer

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural…

Typical high quality text-to-speech (TTS) systems today use a two-stage architecture, with a spectrum model stage that generates spectral frames and a vocoder stage that generates the actual audio. High-quality spectrum models usually…

Sound · Computer Science 2021-04-05 Qing He , Zhiping Xiu , Thilo Koehler , Jilong Wu

This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the…

Computation and Language · Computer Science 2017-03-14 Su Zhu , Kai Yu

Recently, a few novel streaming attention-based sequence-to-sequence (S2S) models have been proposed to perform online speech recognition with linear-time decoding complexity. However, in these models, the decisions to generate tokens are…

Computation and Language · Computer Science 2020-05-18 Hirofumi Inaguma , Yashesh Gaur , Liang Lu , Jinyu Li , Yifan Gong

Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single…

Audio and Speech Processing · Electrical Eng. & Systems 2017-12-06 Bo Li , Tara N. Sainath , Khe Chai Sim , Michiel Bacchiani , Eugene Weinstein , Patrick Nguyen , Zhifeng Chen , Yonghui Wu , Kanishka Rao

This paper presents an end-to-end text-to-speech system with low latency on a CPU, suitable for real-time applications. The system is composed of an autoregressive attention-based sequence-to-sequence acoustic model and the LPCNet vocoder…

Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners.…

Computation and Language · Computer Science 2017-06-21 Chung-Cheng Chiu , Dieterich Lawson , Yuping Luo , George Tucker , Kevin Swersky , Ilya Sutskever , Navdeep Jaitly

Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on…

Computation and Language · Computer Science 2015-05-12 Xiangang Li , Xihong Wu

While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion,…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-18 Jinyu Li , Rui Zhao , Eric Sun , Jeremy H. M. Wong , Amit Das , Zhong Meng , Yifan Gong

The task of automatic language identification (LID) involving multiple dialects of the same language family in the presence of noise is a challenging problem. In these scenarios, the identity of the language/dialect may be reliably present…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-06 Bharat Padi , Anand Mohan , Sriram Ganapathy

Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs). Most of these systems contain separate components that deal with the…

Computation and Language · Computer Science 2016-03-16 Dzmitry Bahdanau , Jan Chorowski , Dmitriy Serdyuk , Philemon Brakel , Yoshua Bengio

We introduce Delayed Streams Modeling (DSM), a flexible formulation for streaming, multimodal sequence-to-sequence learning. Sequence-to-sequence generation is often cast in an offline manner, where the model consumes the complete input…

Long short-term memory recurrent neural networks (LSTM-RNNs) are considered state-of-the art in many speech processing tasks. The recurrence in the network, in principle, allows any input to be remembered for an indefinite time, a feature…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-02 Jeroen Zegers , Hugo Van hamme

Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years. Compared to the traditional pipeline system, the end-to-end ST model has potential…

Computation and Language · Computer Science 2019-12-17 Yuchen Liu , Jiajun Zhang , Hao Xiong , Long Zhou , Zhongjun He , Hua Wu , Haifeng Wang , Chengqing Zong

In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR. In this paper, we present a novel streaming…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-29 Denis Filimonov , Prabhat Pandey , Ariya Rastrow , Ankur Gandhe , Andreas Stolcke

Cascaded speech-to-speech translation systems often suffer from the error accumulation problem and high latency, which is a result of cascaded modules whose inference delays accumulate. In this paper, we propose a transducer-based speech…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-07 Jinzheng Zhao , Niko Moritz , Egor Lakomkin , Ruiming Xie , Zhiping Xiu , Katerina Zmolikova , Zeeshan Ahmed , Yashesh Gaur , Duc Le , Christian Fuegen

This paper addresses the challenges of mining latent patterns and modeling contextual dependencies in complex sequence data. A sequence pattern mining algorithm is proposed by integrating Bidirectional Long Short-Term Memory (BiLSTM) with a…

Machine Learning · Computer Science 2025-04-22 Tao Yang , Yu Cheng , Yaokun Ren , Yujia Lou , Minggu Wei , Honghui Xin

Visual speech recognition models traditionally consist of two stages, feature extraction and classification. Several deep learning approaches have been recently presented aiming to replace the feature extraction stage by automatically…

Computer Vision and Pattern Recognition · Computer Science 2019-07-10 Stavros Petridis , Yujiang Wang , Pingchuan Ma , Zuwei Li , Maja Pantic
‹ Prev 1 2 3 10 Next ›