Related papers: Syntax-Enhanced Pre-trained Model
Pre-trained language models like BERT achieve superior performances in various NLP tasks without explicit consideration of syntactic information. Meanwhile, syntactic information has been proved to be crucial for the success of NLP…
Much recent work suggests that incorporating syntax information from dependency trees can improve task-specific transformer models. However, the effect of incorporating dependency tree information into pre-trained transformer models (e.g.,…
Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions.…
Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study…
It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally efficiently incorporating syntactic structure into neural language models has been a challenging topic.…
We consider retrofitting structure-aware Transformer-based language model for facilitating end tasks by proposing to exploit syntactic distance to encode both the phrasal constituency and dependency connection into the language model. A…
Recent works have revealed that Transformers are implicitly learning the syntactic information in its lower layers from data, albeit is highly dependent on the quality and scale of the training data. However, learning syntactic information…
The utility of linguistic annotation in neural machine translation seemed to had been established in past papers. The experiments were however limited to recurrent sequence-to-sequence architectures and relatively small data settings. We…
Models need appropriate inductive biases to effectively learn from small amounts of data and generalize systematically outside of the training distribution. While Transformers are highly versatile and powerful, they can still benefit from…
Sentence semantic understanding is a key topic in the field of natural language processing. Recently, contextualized word representations derived from pre-trained language models such as ELMO and BERT have shown significant improvements for…
Large neural language models are steadily contributing state-of-the-art performance to question answering and other natural language and information processing tasks. These models are expensive to train. We propose to evaluate whether such…
We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and…
In recent years, we have seen a colossal effort in pre-training multilingual text encoders using large-scale corpora in many languages to facilitate cross-lingual transfer learning. However, due to typological differences across languages,…
Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in…
Attention-based models have shown significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens inputted to an encoder…
We investigate the extent to which individual attention heads in pretrained transformer language models, such as BERT and RoBERTa, implicitly capture syntactic dependency relations. We employ two methods---taking the maximum attention…
Exploiting rich linguistic information in raw text is crucial for expressive text-to-speech (TTS). As large scale pre-trained text representation develops, bidirectional encoder representations from Transformers (BERT) has been proven to…
Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing…
Recently, pre-trained contextual models, such as BERT, have shown to perform well in language related tasks. We revisit the design decisions that govern the applicability of these models for the passage re-ranking task in open-domain…
The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among…