Related papers: Regularized Forward-Backward Decoder for Attention…

Forward-Backward Decoding for Regularizing End-to-End TTS

Neural end-to-end TTS can generate very high-quality synthesized speech, and even close to human recording within similar domain text. However, it performs unsatisfactory when scaling it to challenging test sets. One concern is that the…

Audio and Speech Processing · Electrical Eng. & Systems 2019-07-23 Yibin Zheng , Xi Wang , Lei He , Shifeng Pan , Frank K. Soong , Zhengqi Wen , Jianhua Tao

Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-12 Jian Luo , Jianzong Wang , Ning Cheng , Jing Xiao

Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models

This paper proposes a simple yet effective way of regularising the encoder-decoder-based automatic speech recognition (ASR) models that enhance the robustness of the model and improve the generalisation to out-of-domain scenarios. The…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-24 Alexander Polok , Santosh Kesiraju , Karel Beneš , Lukáš Burget , Jan Černocký

DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition

This paper presents a simple yet effective regularization for the internal language model induced by the decoder in encoder-decoder ASR models, thereby improving robustness and generalization in both in- and out-of-domain settings. The…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-13 Alexander Polok , Santosh Kesiraju , Karel Beneš , Bolaji Yusuf , Lukáš Burget , Jan Černocký

Relaxed Attention for Transformer Models

The powerful modeling capabilities of all-attention-based transformer architectures often cause overfitting and - for natural language processing tasks - lead to an implicitly learned internal language model in the autoregressive…

Machine Learning · Computer Science 2022-09-21 Timo Lohrenz , Björn Möller , Zhengyang Li , Tim Fingscheidt

Focus on the present: a regularization method for the ASR source-target attention layer

This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training. Our method is based on the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-03 Nanxin Chen , Piotr Żelasko , Jesús Villalba , Najim Dehak

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to…

Computation and Language · Computer Science 2023-04-11 Zihao Fu , Wai Lam , Qian Yu , Anthony Man-Cho So , Shengding Hu , Zhiyuan Liu , Nigel Collier

Condenser: a Pre-training Architecture for Dense Retrieval

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient…

Computation and Language · Computer Science 2021-09-22 Luyu Gao , Jamie Callan

Toward Interpretability of Dual-Encoder Models for Dialogue Response Suggestions

This work shows how to improve and interpret the commonly used dual encoder model for response suggestion in dialogue. We present an attentive dual encoder model that includes an attention mechanism on top of the extracted word-level…

Computation and Language · Computer Science 2020-03-12 Yitong Li , Dianqi Li , Sushant Prakash , Peng Wang

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

Tied Multitask Learning for Neural Speech Translation

We explore multitask models for neural translation of speech, augmenting them in order to reflect two intuitive notions. First, we introduce a model where the second task decoder receives information from the decoder of the first task,…

Computation and Language · Computer Science 2018-04-27 Antonios Anastasopoulos , David Chiang

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in this paper we introduce the concept of…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-16 Timo Lohrenz , Patrick Schwarz , Zhengyang Li , Tim Fingscheidt

Accelerating Neural Transformer via an Average Attention Network

With parallelizable attention networks, the neural Transformer is very fast to train. However, due to the auto-regressive architecture and self-attention in the decoder, the decoding procedure becomes slow. To alleviate this issue, we…

Computation and Language · Computer Science 2018-05-08 Biao Zhang , Deyi Xiong , Jinsong Su

Regularizing Deep Networks by Modeling and Predicting Label Structure

We construct custom regularization functions for use in supervised training of deep neural networks. Our technique is applicable when the ground-truth labels themselves exhibit internal structure; we derive a regularizer by learning an…

Computer Vision and Pattern Recognition · Computer Science 2018-04-09 Mohammadreza Mostajabi , Michael Maire , Gregory Shakhnarovich

Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder

Intermediate layer output (ILO) regularization by means of multitask training on encoder side has been shown to be an effective approach to yielding improved results on a wide range of end-to-end ASR frameworks. In this paper, we propose a…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-12 Jicheng Zhang , Yizhou Peng , Haihua Xu , Yi He , Eng Siong Chng , Hao Huang

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Shubham Toshniwal , Anjuli Kannan , Chung-Cheng Chiu , Yonghui Wu , Tara N Sainath , Karen Livescu

DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks

Since 2017, the Transformer-based models play critical roles in various downstream Natural Language Processing tasks. However, a common limitation of the attention mechanism utilized in Transformer Encoder is that it cannot automatically…

Computation and Language · Computer Science 2022-04-20 Ziyang Luo , Yadong Xi , Jing Ma , Zhiwei Yang , Xiaoxi Mao , Changjie Fan , Rongsheng Zhang

Closed-Book Training to Improve Summarization Encoder Memory

A good neural sequence-to-sequence summarization model should have a strong encoder that can distill and memorize the important information from long input texts so that the decoder can generate salient summaries based on the encoder's…

Computation and Language · Computer Science 2018-09-13 Yichen Jiang , Mohit Bansal

Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer

Deep neural network-based systems have significantly improved the performance of speaker diarization tasks. However, end-to-end neural diarization (EEND) systems often struggle to generalize to scenarios with an unseen number of speakers,…

Sound · Computer Science 2023-09-14 Zhengyang Chen , Bing Han , Shuai Wang , Yanmin Qian

Efficient Attention using a Fixed-Size Memory Representation

The standard content-based attention mechanism typically used in sequence-to-sequence models is computationally expensive as it requires the comparison of large encoder and decoder states at each time step. In this work, we propose an…

Computation and Language · Computer Science 2017-07-04 Denny Britz , Melody Y. Guan , Minh-Thang Luong