Related papers: Syntax-BERT: Improving Pre-trained Transformers wi…

Syntax-Enhanced Pre-trained Model

We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they…

Computation and Language · Computer Science 2021-06-01 Zenan Xu , Daya Guo , Duyu Tang , Qinliang Su , Linjun Shou , Ming Gong , Wanjun Zhong , Xiaojun Quan , Nan Duan , Daxin Jiang

A Comprehensive Comparison of Pre-training Language Models

Recently, the development of pre-trained language models has brought natural language processing (NLP) tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained language models. We pre-train a list of…

Computation and Language · Computer Science 2023-07-27 Tong Guo

Do Syntax Trees Help Pre-trained Transformers Extract Information?

Much recent work suggests that incorporating syntax information from dependency trees can improve task-specific transformer models. However, the effect of incorporating dependency tree information into pre-trained transformer models (e.g.,…

Computation and Language · Computer Science 2021-01-28 Devendra Singh Sachan , Yuhao Zhang , Peng Qi , William Hamilton

Improving BERT with Syntax-aware Local Attention

Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions.…

Computation and Language · Computer Science 2021-05-25 Zhongli Li , Qingyu Zhou , Chao Li , Ke Xu , Yunbo Cao

Syntax-augmented Multilingual BERT for Cross-lingual Transfer

In recent years, we have seen a colossal effort in pre-training multilingual text encoders using large-scale corpora in many languages to facilitate cross-lingual transfer learning. However, due to typological differences across languages,…

Computation and Language · Computer Science 2021-06-07 Wasi Uddin Ahmad , Haoran Li , Kai-Wei Chang , Yashar Mehdad

Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding

Attention-based models have shown significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens inputted to an encoder…

Computation and Language · Computer Science 2019-11-15 Dhanasekar Sundararaman , Vivek Subramanian , Guoyin Wang , Shijing Si , Dinghan Shen , Dong Wang , Lawrence Carin

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Large, pre-trained transformer-based language models such as BERT have drastically changed the Natural Language Processing (NLP) field. We present a survey of recent work that uses these large language models to solve NLP tasks via…

Computation and Language · Computer Science 2021-11-03 Bonan Min , Hayley Ross , Elior Sulem , Amir Pouran Ben Veyseh , Thien Huu Nguyen , Oscar Sainz , Eneko Agirre , Ilana Heinz , Dan Roth

On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations

The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among…

Computation and Language · Computer Science 2021-02-11 Laura Pérez-Mayos , Roberto Carlini , Miguel Ballesteros , Leo Wanner

On the Effectiveness of Transfer Learning for Code Search

The Transformer architecture and transfer learning have marked a quantum leap in natural language processing, improving the state of the art across a range of text-based tasks. This paper examines how these advancements can be applied to…

Software Engineering · Computer Science 2022-08-29 Pasquale Salza , Christoph Schwizer , Jian Gu , Harald C. Gall

TreeBERT: A Tree-Based Pre-Trained Model for Programming Language

Source code can be parsed into the abstract syntax tree (AST) based on defined syntax rules. However, in pre-training, little work has considered the incorporation of tree structure into the learning process. In this paper, we present…

Machine Learning · Computer Science 2021-07-16 Xue Jiang , Zhuoran Zheng , Chen Lyu , Liang Li , Lei Lyu

Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation

Fine-tuning pre-trained language models like BERT has become an effective way in NLP and yields state-of-the-art results on many downstream tasks. Recent studies on adapting BERT to new tasks mainly focus on modifying the model structure,…

Computation and Language · Computer Science 2020-02-25 Yige Xu , Xipeng Qiu , Ligao Zhou , Xuanjing Huang

Visualizing and Understanding the Effectiveness of BERT

Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks. However, it is unclear why the pre-training-then-fine-tuning paradigm can improve performance and generalization capability across different…

Computation and Language · Computer Science 2019-08-16 Yaru Hao , Li Dong , Furu Wei , Ke Xu

Tree-Planted Transformers: Unidirectional Transformer Language Models with Implicit Syntactic Supervision

Syntactic Language Models (SLMs) can be trained efficiently to reach relatively high performance; however, they have trouble with inference efficiency due to the explicit generation of syntactic structures. In this paper, we propose a new…

Computation and Language · Computer Science 2025-08-20 Ryo Yoshida , Taiga Someya , Yohei Oseki

Advancements in Natural Language Processing: Exploring Transformer-Based Architectures for Text Understanding

Natural Language Processing (NLP) has witnessed a transformative leap with the advent of transformer-based architectures, which have significantly enhanced the ability of machines to understand and generate human-like text. This paper…

Computation and Language · Computer Science 2025-03-27 Tianhao Wu , Yu Wang , Ngoc Quach

BERTSel: Answer Selection with Pre-trained Models

Recently, pre-trained models have been the dominant paradigm in natural language processing. They achieved remarkable state-of-the-art performance across a wide range of related tasks, such as textual entailment, natural language inference,…

Computation and Language · Computer Science 2019-05-21 Dongfang Li , Yifei Yu , Qingcai Chen , Xinyu Li

Robust Transfer Learning with Pretrained Language Models through Adapters

Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those large language models on downstream tasks or combining it with task-specific…

Computation and Language · Computer Science 2021-08-06 Wenjuan Han , Bo Pang , Yingnian Wu

Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and…

Computation and Language · Computer Science 2021-02-23 Tingyu Xia , Yue Wang , Yuan Tian , Yi Chang

Transformers: "The End of History" for NLP?

Recent advances in neural architectures, such as the Transformer, coupled with the emergence of large-scale pre-trained models such as BERT, have revolutionized the field of Natural Language Processing (NLP), pushing the state of the art…

Computation and Language · Computer Science 2021-09-24 Anton Chernyavskiy , Dmitry Ilvovsky , Preslav Nakov

Better Neural Machine Translation by Extracting Linguistic Information from BERT

Adding linguistic information (syntax or semantics) to neural machine translation (NMT) has mostly focused on using point estimates from pre-trained models. Directly using the capacity of massive pre-trained contextual word embedding models…

Computation and Language · Computer Science 2021-04-08 Hassan S. Shavarani , Anoop Sarkar

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as…

Computation and Language · Computer Science 2019-09-30 Wei Wang , Bin Bi , Ming Yan , Chen Wu , Zuyi Bao , Jiangnan Xia , Liwei Peng , Luo Si