Related papers: Transformer++

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism.…

Computation and Language · Computer Science 2023-08-03 Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , Illia Polosukhin

Modeling Recurrence for Transformer

Recently, the Transformer model that is based solely on attention mechanisms, has advanced the state-of-the-art on various machine translation tasks. However, recent studies reveal that the lack of recurrence hinders its further improvement…

Computation and Language · Computer Science 2019-04-08 Jie Hao , Xing Wang , Baosong Yang , Longyue Wang , Jinfeng Zhang , Zhaopeng Tu

Weighted Transformer Network for Machine Translation

State-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion. Vaswani et al. (2017) propose a new architecture that avoids recurrence and convolution…

Artificial Intelligence · Computer Science 2017-11-08 Karim Ahmed , Nitish Shirish Keskar , Richard Socher

Multi-Head Self-Attention with Role-Guided Masks

The state of the art in learning meaningful semantic representations of words is the Transformer model and its attention mechanisms. Simply put, the attention mechanisms learn to attend to specific parts of the input dispensing recurrence…

Computation and Language · Computer Science 2020-12-24 Dongsheng Wang , Casper Hansen , Lucas Chaves Lima , Christian Hansen , Maria Maistro , Jakob Grue Simonsen , Christina Lioma

Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation

Transformer-based models have been achieving state-of-the-art results in several fields of Natural Language Processing. However, its direct application to speech tasks is not trivial. The nature of this sequences carries problems such as…

Computation and Language · Computer Science 2022-05-17 Gerard Sant , Gerard I. Gállego , Belen Alastruey , Marta R. Costa-Jussà

Source Dependency-Aware Transformer with Supervised Self-Attention

Recently, Transformer has achieved the state-of-the-art performance on many machine translation tasks. However, without syntax knowledge explicitly considered in the encoder, incorrect context information that violates the syntax structure…

Computation and Language · Computer Science 2019-09-06 Chengyi Wang , Shuangzhi Wu , Shujie Liu

Learning Hard Retrieval Decoder Attention for Transformers

The Transformer translation model is based on the multi-head attention mechanism, which can be parallelized easily. The multi-head attention network performs the scaled dot-product attention function in parallel, empowering the model by…

Computation and Language · Computer Science 2021-09-13 Hongfei Xu , Qiuhui Liu , Josef van Genabith , Deyi Xiong

Enhancing Machine Translation with Dependency-Aware Self-Attention

Most neural machine translation models only rely on pairs of parallel sentences, assuming syntactic information is automatically learned by an attention mechanism. In this work, we investigate different approaches to incorporate syntactic…

Computation and Language · Computer Science 2020-04-22 Emanuele Bugliarello , Naoaki Okazaki

Evolving Attention with Residual Convolutions

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However,…

Machine Learning · Computer Science 2021-02-26 Yujing Wang , Yaming Yang , Jiangang Bai , Mingliang Zhang , Jing Bai , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks

Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances. Since solutions to such problems do not depend on the order of elements of the set,…

Machine Learning · Computer Science 2019-05-28 Juho Lee , Yoonho Lee , Jungtaek Kim , Adam R. Kosiorek , Seungjin Choi , Yee Whye Teh

Learning Source Phrase Representations for Neural Machine Translation

The Transformer translation model (Vaswani et al., 2017) based on a multi-head attention mechanism can be computed effectively in parallel and has significantly pushed forward the performance of Neural Machine Translation (NMT). Though…

Computation and Language · Computer Science 2020-06-26 Hongfei Xu , Josef van Genabith , Deyi Xiong , Qiuhui Liu , Jingyi Zhang

Parallel Attention Mechanisms in Neural Machine Translation

Recent papers in neural machine translation have proposed the strict use of attention mechanisms over previous standards such as recurrent and convolutional neural networks (RNNs and CNNs). We propose that by running traditionally stacked…

Computation and Language · Computer Science 2018-10-31 Julian Richard Medina , Jugal Kalita

A Tensorized Transformer for Language Modeling

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP)…

Computation and Language · Computer Science 2019-11-07 Xindian Ma , Peng Zhang , Shuai Zhang , Nan Duan , Yuexian Hou , Dawei Song , Ming Zhou

Multi-View Self-Attention Based Transformer for Speaker Recognition

Initially developed for natural language processing (NLP), Transformer model is now widely used for speech processing tasks such as speaker recognition, due to its powerful sequence modeling capabilities. However, conventional…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-28 Rui Wang , Junyi Ao , Long Zhou , Shujie Liu , Zhihua Wei , Tom Ko , Qing Li , Yu Zhang

Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel

Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the…

Machine Learning · Computer Science 2019-11-13 Yao-Hung Hubert Tsai , Shaojie Bai , Makoto Yamada , Louis-Philippe Morency , Ruslan Salakhutdinov

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different…

Computation and Language · Computer Science 2020-10-06 Alessandro Raganato , Yves Scherrer , Jörg Tiedemann

Keyword Transformer: A Self-Attention Model for Keyword Spotting

The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-11 Axel Berg , Mark O'Connor , Miguel Tairum Cruz

Integrating Dependency Tree Into Self-attention for Sentence Representation

Recent progress on parse tree encoder for sentence representation learning is notable. However, these works mainly encode tree structures recursively, which is not conducive to parallelization. On the other hand, these works rarely take…

Computation and Language · Computer Science 2022-05-10 Junhua Ma , Jiajun Li , Yuxuan Liu , Shangbo Zhou , Xue Li

Memory Transformer

Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware…

Computation and Language · Computer Science 2021-02-17 Mikhail S. Burtsev , Yuri Kuratov , Anton Peganov , Grigory V. Sapunov

Neural Machine Translation with Key-Value Memory-Augmented Attention

Although attention-based Neural Machine Translation (NMT) has achieved remarkable progress in recent years, it still suffers from issues of repeating and dropping translations. To alleviate these issues, we propose a novel key-value…

Computation and Language · Computer Science 2018-07-02 Fandong Meng , Zhaopeng Tu , Yong Cheng , Haiyang Wu , Junjie Zhai , Yuekui Yang , Di Wang