English
Related papers

Related papers: Regularized Forward-Backward Decoder for Attention…

200 papers

Neural end-to-end TTS can generate very high-quality synthesized speech, and even close to human recording within similar domain text. However, it performs unsatisfactory when scaling it to challenging test sets. One concern is that the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-23 Yibin Zheng , Xi Wang , Lei He , Shifeng Pan , Frank K. Soong , Zhengqi Wen , Jianhua Tao

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-12 Jian Luo , Jianzong Wang , Ning Cheng , Jing Xiao

This paper proposes a simple yet effective way of regularising the encoder-decoder-based automatic speech recognition (ASR) models that enhance the robustness of the model and improve the generalisation to out-of-domain scenarios. The…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-24 Alexander Polok , Santosh Kesiraju , Karel Beneš , Lukáš Burget , Jan Černocký

This paper presents a simple yet effective regularization for the internal language model induced by the decoder in encoder-decoder ASR models, thereby improving robustness and generalization in both in- and out-of-domain settings. The…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-13 Alexander Polok , Santosh Kesiraju , Karel Beneš , Bolaji Yusuf , Lukáš Burget , Jan Černocký

The powerful modeling capabilities of all-attention-based transformer architectures often cause overfitting and - for natural language processing tasks - lead to an implicitly learned internal language model in the autoregressive…

Machine Learning · Computer Science 2022-09-21 Timo Lohrenz , Björn Möller , Zhengyang Li , Tim Fingscheidt

This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training. Our method is based on the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-03 Nanxin Chen , Piotr Żelasko , Jesús Villalba , Najim Dehak

The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to…

Computation and Language · Computer Science 2023-04-11 Zihao Fu , Wai Lam , Qian Yu , Anthony Man-Cho So , Shengding Hu , Zhiyuan Liu , Nigel Collier

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient…

Computation and Language · Computer Science 2021-09-22 Luyu Gao , Jamie Callan

This work shows how to improve and interpret the commonly used dual encoder model for response suggestion in dialogue. We present an attentive dual encoder model that includes an attention mechanism on top of the extracted word-level…

Computation and Language · Computer Science 2020-03-12 Yitong Li , Dianqi Li , Sushant Prakash , Peng Wang

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

We explore multitask models for neural translation of speech, augmenting them in order to reflect two intuitive notions. First, we introduce a model where the second task decoder receives information from the decoder of the first task,…

Computation and Language · Computer Science 2018-04-27 Antonios Anastasopoulos , David Chiang

Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in this paper we introduce the concept of…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-16 Timo Lohrenz , Patrick Schwarz , Zhengyang Li , Tim Fingscheidt

With parallelizable attention networks, the neural Transformer is very fast to train. However, due to the auto-regressive architecture and self-attention in the decoder, the decoding procedure becomes slow. To alleviate this issue, we…

Computation and Language · Computer Science 2018-05-08 Biao Zhang , Deyi Xiong , Jinsong Su

We construct custom regularization functions for use in supervised training of deep neural networks. Our technique is applicable when the ground-truth labels themselves exhibit internal structure; we derive a regularizer by learning an…

Computer Vision and Pattern Recognition · Computer Science 2018-04-09 Mohammadreza Mostajabi , Michael Maire , Gregory Shakhnarovich

Intermediate layer output (ILO) regularization by means of multitask training on encoder side has been shown to be an effective approach to yielding improved results on a wide range of end-to-end ASR frameworks. In this paper, we propose a…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-12 Jicheng Zhang , Yizhou Peng , Haihua Xu , Yi He , Eng Siong Chng , Hao Huang

Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Shubham Toshniwal , Anjuli Kannan , Chung-Cheng Chiu , Yonghui Wu , Tara N Sainath , Karen Livescu

Since 2017, the Transformer-based models play critical roles in various downstream Natural Language Processing tasks. However, a common limitation of the attention mechanism utilized in Transformer Encoder is that it cannot automatically…

Computation and Language · Computer Science 2022-04-20 Ziyang Luo , Yadong Xi , Jing Ma , Zhiwei Yang , Xiaoxi Mao , Changjie Fan , Rongsheng Zhang

A good neural sequence-to-sequence summarization model should have a strong encoder that can distill and memorize the important information from long input texts so that the decoder can generate salient summaries based on the encoder's…

Computation and Language · Computer Science 2018-09-13 Yichen Jiang , Mohit Bansal

Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers,…

Sound · Computer Science 2023-09-14 Zhengyang Chen , Bing Han , Shuai Wang , Yanmin Qian

The standard content-based attention mechanism typically used in sequence-to-sequence models is computationally expensive as it requires the comparison of large encoder and decoder states at each time step. In this work, we propose an…

Computation and Language · Computer Science 2017-07-04 Denny Britz , Melody Y. Guan , Minh-Thang Luong
‹ Prev 1 2 3 10 Next ›