English
Related papers

Related papers: Intermediate-layer output Regularization for Atten…

200 papers

Incremental Decoding is an effective framework that enables the use of an offline model in a simultaneous setting without modifying the original model, making it suitable for Low-Latency Simultaneous Speech Translation. However, this…

Computation and Language · Computer Science 2024-01-12 Jiaxin Guo , Zhanglin Wu , Zongyao Li , Hengchao Shang , Daimeng Wei , Xiaoyu Chen , Zhiqiang Rao , Shaojun Li , Hao Yang

This study presents a novel approach for knowledge distillation (KD) from a BERT teacher model to an automatic speech recognition (ASR) model using intermediate layers. To distil the teacher's knowledge, we use an attention decoder that…

Computation and Language · Computer Science 2024-01-23 Michael Hentschel , Yuta Nishikawa , Tatsuya Komatsu , Yusuke Fujita

We propose Intermediate Layer Optimization (ILO), a novel optimization algorithm for solving inverse problems with deep generative models. Instead of optimizing only over the initial latent code, we progressively change the input layer…

Machine Learning · Computer Science 2021-02-16 Giannis Daras , Joseph Dean , Ajil Jalal , Alexandros G. Dimakis

Nowadays, attention models are one of the popular candidates for speech recognition. So far, many studies mainly focus on the encoder structure or the attention module to enhance the performance of these models. However, mostly ignore the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-29 Tobias Watzel , Ludwig Kürzinger , Lujun Li , Gerhard Rigoll

Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years. Compared to the traditional pipeline system, the end-to-end ST model has potential…

Computation and Language · Computer Science 2019-12-17 Yuchen Liu , Jiajun Zhang , Hao Xiong , Long Zhou , Zhongjun He , Hua Wu , Haifeng Wang , Chengqing Zong

This paper proposes a simple yet effective way of regularising the encoder-decoder-based automatic speech recognition (ASR) models that enhance the robustness of the model and improve the generalisation to out-of-domain scenarios. The…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-24 Alexander Polok , Santosh Kesiraju , Karel Beneš , Lukáš Burget , Jan Černocký

End-to-end (E2E) automatic speech recognition (ASR) systems have revolutionized the field by integrating all components into a single neural network, with attention-based encoder-decoder models achieving state-of-the-art performance.…

Computation and Language · Computer Science 2025-07-01 Duygu Altinok

An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the training transcripts. To fuse an external LM using Bayes posterior theory, the log likelihood produced by the ILM has to be accurately estimated…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-03 Yufei Liu , Rao Ma , Haihua Xu , Yi He , Zejun Ma , Weibin Zhang

This paper proposes a self-regularised minimum latency training (SR-MLT) method for streaming Transformer-based automatic speech recognition (ASR) systems. In previous works, latency was optimised by truncating the online attention weights…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-25 Mohan Li , Rama Doddipatla , Catalin Zorila

Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A…

Computation and Language · Computer Science 2021-06-18 Mohammad Zeineldeen , Aleksandr Glushko , Wilfried Michel , Albert Zeyer , Ralf Schlüter , Hermann Ney

Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and…

Machine Learning · Computer Science 2025-05-23 Peng Lu , Ahmad Rashid , Ivan Kobyzev , Mehdi Rezagholizadeh , Philippe Langlais

In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last…

Computation and Language · Computer Science 2022-08-30 Fenglin Liu , Xuancheng Ren , Guangxiang Zhao , Chenyu You , Xuewei Ma , Xian Wu , Xu Sun

This paper proposes serialized output training (SOT), a novel framework for multi-speaker overlapped speech recognition based on an attention-based encoder-decoder approach. Instead of having multiple output layers as with the permutation…

Computation and Language · Computer Science 2020-08-11 Naoyuki Kanda , Yashesh Gaur , Xiaofei Wang , Zhong Meng , Takuya Yoshioka

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.…

Audio and Speech Processing · Electrical Eng. & Systems 2021-12-17 Chengyi Wang , Yu Wu , Sanyuan Chen , Shujie Liu , Jinyu Li , Yao Qian , Zhenglu Yang

End-to-end training of deep learning-based models allows for implicit learning of intermediate representations based on the final task loss. However, the end-to-end approach ignores the useful domain knowledge encoded in explicit…

Computation and Language · Computer Science 2017-04-20 Shubham Toshniwal , Hao Tang , Liang Lu , Karen Livescu

In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary. This paper introduces a streaming…

Computation and Language · Computer Science 2023-10-03 Sara Papi , Peidong Wang , Junkun Chen , Jian Xue , Jinyu Li , Yashesh Gaur

Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR). Serialized output training (SOT) is a classic method to address multi-talker…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-02 Mohan Shi , Zengrui Jin , Yaoxun Xu , Yong Xu , Shi-Xiong Zhang , Kun Wei , Yiwen Shao , Chunlei Zhang , Dong Yu

We propose a multitask training method for attention-based end-to-end speech recognition models. We regularize the decoder in a listen, attend, and spell model by multitask training it on both audio-text and text-only data. Trained on the…

Computation and Language · Computer Science 2021-06-15 Peidong Wang , Tara N. Sainath , Ron J. Weiss

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-12 Jian Luo , Jianzong Wang , Ning Cheng , Jing Xiao

Traditional neural machine translation is limited to the topmost encoder layer's context representation and cannot directly perceive the lower encoder layers. Existing solutions usually rely on the adjustment of network architecture, making…

Computation and Language · Computer Science 2020-11-04 Qiang Wang , Changliang Li , Yue Zhang , Tong Xiao , Jingbo Zhu
‹ Prev 1 2 3 10 Next ›