Related papers: Syntax-Enhanced Pre-trained Model

Syntax-BERT: Improving Pre-trained Transformers with Syntax Trees

Pre-trained language models like BERT achieve superior performances in various NLP tasks without explicit consideration of syntactic information. Meanwhile, syntactic information has been proved to be crucial for the success of NLP…

Computation and Language · Computer Science 2021-03-09 Jiangang Bai , Yujing Wang , Yiren Chen , Yaming Yang , Jing Bai , Jing Yu , Yunhai Tong

Do Syntax Trees Help Pre-trained Transformers Extract Information?

Much recent work suggests that incorporating syntax information from dependency trees can improve task-specific transformer models. However, the effect of incorporating dependency tree information into pre-trained transformer models (e.g.,…

Computation and Language · Computer Science 2021-01-28 Devendra Singh Sachan , Yuhao Zhang , Peng Qi , William Hamilton

Improving BERT with Syntax-aware Local Attention

Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions.…

Computation and Language · Computer Science 2021-05-25 Zhongli Li , Qingyu Zhou , Chao Li , Ke Xu , Yunbo Cao

How much pretraining data do language models need to learn syntax?

Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study…

Computation and Language · Computer Science 2021-09-10 Laura Pérez-Mayos , Miguel Ballesteros , Leo Wanner

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally efficiently incorporating syntactic structure into neural language models has been a challenging topic.…

Computation and Language · Computer Science 2020-05-13 Wenyu Du , Zhouhan Lin , Yikang Shen , Timothy J. O'Donnell , Yoshua Bengio , Yue Zhang

Retrofitting Structure-aware Transformer Language Model for End Tasks

We consider retrofitting structure-aware Transformer-based language model for facilitating end tasks by proposing to exploit syntactic distance to encode both the phrasal constituency and dependency connection into the language model. A…

Computation and Language · Computer Science 2020-09-17 Hao Fei , Yafeng Ren , Donghong Ji

Syntax-guided Localized Self-attention by Constituency Syntactic Distance

Recent works have revealed that Transformers are implicitly learning the syntactic information in its lower layers from data, albeit is highly dependent on the quality and scale of the training data. However, learning syntactic information…

Computation and Language · Computer Science 2022-10-24 Shengyuan Hou , Jushi Kai , Haotian Xue , Bingyu Zhu , Bo Yuan , Longtao Huang , Xinbing Wang , Zhouhan Lin

Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed

The utility of linguistic annotation in neural machine translation seemed to had been established in past papers. The experiments were however limited to recurrent sequence-to-sequence architectures and relatively small data settings. We…

Computation and Language · Computer Science 2019-10-25 Thuong-Hai Pham , Dominik Macháček , Ondřej Bojar

Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations

Models need appropriate inductive biases to effectively learn from small amounts of data and generalize systematically outside of the training distribution. While Transformers are highly versatile and powerful, they can still benefit from…

Computation and Language · Computer Science 2024-07-08 Matthias Lindemann , Alexander Koller , Ivan Titov

Learning Better Sentence Representation with Syntax Information

Sentence semantic understanding is a key topic in the field of natural language processing. Recently, contextualized word representations derived from pre-trained language models such as ELMO and BERT have shown significant improvements for…

Computation and Language · Computer Science 2021-01-12 Chen Yang

Syntax-informed Question Answering with Heterogeneous Graph Transformer

Large neural language models are steadily contributing state-of-the-art performance to question answering and other natural language and information processing tasks. These models are expensive to train. We propose to evaluate whether such…

Computation and Language · Computer Science 2022-05-24 Fangyi Zhu , Lok You Tan , See-Kiong Ng , Stéphane Bressan

Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and…

Computation and Language · Computer Science 2021-02-23 Tingyu Xia , Yue Wang , Yuan Tian , Yi Chang

Syntax-augmented Multilingual BERT for Cross-lingual Transfer

In recent years, we have seen a colossal effort in pre-training multilingual text encoders using large-scale corpora in many languages to facilitate cross-lingual transfer learning. However, due to typological differences across languages,…

Computation and Language · Computer Science 2021-06-07 Wasi Uddin Ahmad , Haoran Li , Kai-Wei Chang , Yashar Mehdad

Extracting Sentence Embeddings from Pretrained Transformer Models

Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in…

Computation and Language · Computer Science 2025-02-21 Lukas Stankevičius , Mantas Lukoševičius

Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding

Attention-based models have shown significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens inputted to an encoder…

Computation and Language · Computer Science 2019-11-15 Dhanasekar Sundararaman , Vivek Subramanian , Guoyin Wang , Shijing Si , Dinghan Shen , Dong Wang , Lawrence Carin

Do Attention Heads in BERT Track Syntactic Dependencies?

We investigate the extent to which individual attention heads in pretrained transformer language models, such as BERT and RoBERTa, implicitly capture syntactic dependency relations. We employ two methods---taking the maximum attention…

Computation and Language · Computer Science 2019-11-28 Phu Mon Htut , Jason Phang , Shikha Bordia , Samuel R. Bowman

Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis

Exploiting rich linguistic information in raw text is crucial for expressive text-to-speech (TTS). As large scale pre-trained text representation develops, bidirectional encoder representations from Transformers (BERT) has been proven to…

Computation and Language · Computer Science 2022-11-14 Yixuan Zhou , Changhe Song , Jingbei Li , Zhiyong Wu , Yanyao Bian , Dan Su , Helen Meng

How to Fine-Tune BERT for Text Classification?

Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing…

Computation and Language · Computer Science 2020-02-06 Chi Sun , Xipeng Qiu , Yige Xu , Xuanjing Huang

Exploiting Sentence-Level Representations for Passage Ranking

Recently, pre-trained contextual models, such as BERT, have shown to perform well in language related tasks. We revisit the design decisions that govern the applicability of these models for the passage re-ranking task in open-domain…

Information Retrieval · Computer Science 2021-08-31 Jurek Leonhardt , Fabian Beringer , Avishek Anand

On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations

The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among…

Computation and Language · Computer Science 2021-02-11 Laura Pérez-Mayos , Roberto Carlini , Miguel Ballesteros , Leo Wanner