Related papers: SG-Net: Syntax-Guided Machine Reading Comprehensio…

SG-Net: Syntax Guided Transformer for Language Representation

Understanding human language is one of the key themes of artificial intelligence. For language representation, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy texts and getting rid of the…

Computation and Language · Computer Science 2021-01-08 Zhuosheng Zhang , Yuwei Wu , Junru Zhou , Sufeng Duan , Hai Zhao , Rui Wang

Deps-SAN: Neural Machine Translation with Dependency-Scaled Self-Attention Network

Syntax knowledge contributes its powerful strength in Neural machine translation (NMT) tasks. Early NMT works supposed that syntax details can be automatically learned from numerous texts via attention networks. However, succeeding…

Computation and Language · Computer Science 2022-10-05 Ru Peng , Nankai Lin , Yi Fang , Shengyi Jiang , Tianyong Hao , Boyu Chen , Junbo Zhao

Syntax-guided Localized Self-attention by Constituency Syntactic Distance

Recent works have revealed that Transformers are implicitly learning the syntactic information in its lower layers from data, albeit is highly dependent on the quality and scale of the training data. However, learning syntactic information…

Computation and Language · Computer Science 2022-10-24 Shengyuan Hou , Jushi Kai , Haotian Xue , Bingyu Zhu , Bo Yuan , Longtao Huang , Xinbing Wang , Zhouhan Lin

Syntactic Knowledge via Graph Attention with BERT in Machine Translation

Although the Transformer model can effectively acquire context features via a self-attention mechanism, deeper syntactic knowledge is still not effectively modeled. To alleviate the above problem, we propose Syntactic knowledge via Graph…

Computation and Language · Computer Science 2023-05-24 Yuqian Dai , Serge Sharoff , Marc de Kamps

Distance-based Self-Attention Network for Natural Language Inference

Attention mechanism has been used as an ancillary means to help RNN or CNN. However, the Transformer (Vaswani et al., 2017) recently recorded the state-of-the-art performance in machine translation with a dramatic reduction in training time…

Computation and Language · Computer Science 2017-12-07 Jinbae Im , Sungzoon Cho

Improving BERT with Syntax-aware Local Attention

Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions.…

Computation and Language · Computer Science 2021-05-25 Zhongli Li , Qingyu Zhou , Chao Li , Ke Xu , Yunbo Cao

Syntax-Enhanced Pre-trained Model

We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they…

Computation and Language · Computer Science 2021-06-01 Zenan Xu , Daya Guo , Duyu Tang , Qinliang Su , Linjun Shou , Ming Gong , Wanjun Zhong , Xiaojun Quan , Nan Duan , Daxin Jiang

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

We explore options to use Transformer networks in neural transducer for end-to-end speech recognition. Transformer networks use self-attention for sequence modeling and comes with advantages in parallel computation and capturing contexts.…

Audio and Speech Processing · Electrical Eng. & Systems 2019-10-30 Ching-Feng Yeh , Jay Mahadeokar , Kaustubh Kalgaonkar , Yongqiang Wang , Duc Le , Mahaveer Jain , Kjell Schubert , Christian Fuegen , Michael L. Seltzer

Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed

The utility of linguistic annotation in neural machine translation seemed to had been established in past papers. The experiments were however limited to recurrent sequence-to-sequence architectures and relatively small data settings. We…

Computation and Language · Computer Science 2019-10-25 Thuong-Hai Pham , Dominik Macháček , Ondřej Bojar

Stochastic Answer Networks for Machine Reading Comprehension

We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension. Compared to previous work such as ReasoNet which used reinforcement learning to determine the number of…

Computation and Language · Computer Science 2018-05-16 Xiaodong Liu , Yelong Shen , Kevin Duh , Jianfeng Gao

Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding

Attention-based models have shown significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens inputted to an encoder…

Computation and Language · Computer Science 2019-11-15 Dhanasekar Sundararaman , Vivek Subramanian , Guoyin Wang , Shijing Si , Dinghan Shen , Dong Wang , Lawrence Carin

Self-Attention Networks for Intent Detection

Self-attention networks (SAN) have shown promising performance in various Natural Language Processing (NLP) scenarios, especially in machine translation. One of the main points of SANs is the strength of capturing long-range and multi-scale…

Computation and Language · Computer Science 2020-06-30 Sevinj Yolchuyeva , Géza Németh , Bálint Gyires-Tóth

SANVis: Visual Analytics for Understanding Self-Attention Networks

Attention networks, a deep neural network architecture inspired by humans' attention mechanism, have seen significant success in image captioning, machine translation, and many other applications. Recently, they have been further evolved…

Computation and Language · Computer Science 2019-09-23 Cheonbok Park , Inyoup Na , Yongjang Jo , Sungbok Shin , Jaehyo Yoo , Bum Chul Kwon , Jian Zhao , Hyungjong Noh , Yeonsoo Lee , Jaegul Choo

Improve Retrieval-based Dialogue System via Syntax-Informed Attention

Multi-turn response selection is a challenging task due to its high demands on efficient extraction of the matching features from abundant information provided by context utterances. Since incorporating syntactic information like dependency…

Artificial Intelligence · Computer Science 2023-03-14 Tengtao Song , Nuo Chen , Ji Jiang , Zhihong Zhu , Yuexian Zou

S2Sent: Nested Selectivity Aware Sentence Representation Learning

The combination of Transformer-based encoders with contrastive learning represents the current mainstream paradigm for sentence representation learning. This paradigm is typically based on the hidden states of the last Transformer block of…

Computation and Language · Computer Science 2025-08-26 Jianxiang Zang , Nijia Mo , Yonda Wei , Meiling Ning , Hui Liu

GATology for Linguistics: What Syntactic Dependencies It Knows

Graph Attention Network (GAT) is a graph neural network which is one of the strategies for modeling and representing explicit syntactic knowledge and can work with pre-trained models, such as BERT, in downstream tasks. Currently, there is…

Computation and Language · Computer Science 2023-05-24 Yuqian Dai , Serge Sharoff , Marc de Kamps

Self-Attention Transducers for End-to-End Speech Recognition

Recurrent neural network transducers (RNN-T) have been successfully applied in end-to-end speech recognition. However, the recurrent structure makes it difficult for parallelization . In this paper, we propose a self-attention transducer…

Audio and Speech Processing · Electrical Eng. & Systems 2020-02-25 Zhengkun Tian , Jiangyan Yi , Jianhua Tao , Ye Bai , Zhengqi Wen

Self-Attention with Structural Position Representations

Although self-attention networks (SANs) have advanced the state-of-the-art on various NLP tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al., 2018). In this work, we propose to augment SANs with…

Computation and Language · Computer Science 2019-09-04 Xing Wang , Zhaopeng Tu , Longyue Wang , Shuming Shi

Enhancing Machine Translation with Dependency-Aware Self-Attention

Most neural machine translation models only rely on pairs of parallel sentences, assuming syntactic information is automatically learned by an attention mechanism. In this work, we investigate different approaches to incorporate syntactic…

Computation and Language · Computer Science 2020-04-22 Emanuele Bugliarello , Naoaki Okazaki

Convolutional Self-Attention Networks

Self-attention networks (SANs) have drawn increasing interest due to their high parallelization in computation and flexibility in modeling dependencies. SANs can be further enhanced with multi-head attention by allowing the model to attend…

Computation and Language · Computer Science 2019-04-08 Baosong Yang , Longyue Wang , Derek Wong , Lidia S. Chao , Zhaopeng Tu