Related papers: Incorporating BERT into Parallel Sequence Decoding…

BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation…

Computation and Language · Computer Science 2021-09-13 Haoran Xu , Benjamin Van Durme , Kenton Murray

Incorporating BERT into Neural Machine Translation

The recently proposed BERT has shown great power on a variety of natural language understanding tasks, such as text classification, reading comprehension, etc. However, how to effectively apply BERT to neural machine translation (NMT) lacks…

Computation and Language · Computer Science 2020-02-18 Jinhua Zhu , Yingce Xia , Lijun Wu , Di He , Tao Qin , Wengang Zhou , Houqiang Li , Tie-Yan Liu

Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study

Large pre-trained language models help to achieve state of the art on a variety of natural language processing (NLP) tasks, nevertheless, they still suffer from forgetting when incrementally learning a sequence of tasks. To alleviate this…

Computation and Language · Computer Science 2023-03-03 Mingxu Tao , Yansong Feng , Dongyan Zhao

BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining

BERT (Bidirectional Encoder Representations from Transformers) has revolutionized the field of natural language processing through its exceptional performance on numerous tasks. Yet, the majority of researchers have mainly concentrated on…

Computation and Language · Computer Science 2024-12-11 Wen Liang , Youzhi Liang

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional…

Computation and Language · Computer Science 2019-05-28 Jacob Devlin , Ming-Wei Chang , Kenton Lee , Kristina Toutanova

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

What the [MASK]? Making Sense of Language-Specific BERT Models

Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained contextual representation models. In particular, Devlin et al. (2019) proposed a model, called BERT…

Computation and Language · Computer Science 2020-03-09 Debora Nozza , Federico Bianchi , Dirk Hovy

BoostingBERT:Integrating Multi-Class Boosting into BERT for NLP Tasks

As a pre-trained Transformer model, BERT (Bidirectional Encoder Representations from Transformers) has achieved ground-breaking performance on multiple NLP tasks. On the other hand, Boosting is a popular ensemble learning technique which…

Computation and Language · Computer Science 2020-09-15 Tongwen Huang , Qingyun She , Junlin Zhang

A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models

Undirected neural sequence models such as BERT (Devlin et al., 2019) have received renewed interest due to their success on discriminative natural language understanding tasks such as question-answering and natural language inference. The…

Machine Learning · Computer Science 2020-02-10 Elman Mansimov , Alex Wang , Sean Welleck , Kyunghyun Cho

Improving BERT with Hybrid Pooling Network and Drop Mask

Transformer-based pre-trained language models, such as BERT, achieve great success in various natural language understanding tasks. Prior research found that BERT captures a rich hierarchy of linguistic information at different layers.…

Computation and Language · Computer Science 2023-07-17 Qian Chen , Wen Wang , Qinglin Zhang , Chong Deng , Ma Yukun , Siqi Zheng

Language-agnostic BERT Sentence Embedding

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.…

Computation and Language · Computer Science 2022-03-09 Fangxiaoyu Feng , Yinfei Yang , Daniel Cer , Naveen Arivazhagan , Wei Wang

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges

Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that…

Information Retrieval · Computer Science 2024-03-05 Jiajia Wang , Jimmy X. Huang , Xinhui Tu , Junmei Wang , Angela J. Huang , Md Tahmid Rahman Laskar , Amran Bhuiyan

BERT-DRE: BERT with Deep Recursive Encoder for Natural Language Sentence Matching

This paper presents a deep neural architecture, for Natural Language Sentence Matching (NLSM) by adding a deep recursive encoder to BERT so called BERT with Deep Recursive Encoder (BERT-DRE). Our analysis of model behavior shows that BERT…

Computation and Language · Computer Science 2021-11-05 Ehsan Tavan , Ali Rahmati , Maryam Najafi , Saeed Bibak , Zahed Rahmati

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive…

Computation and Language · Computer Science 2018-12-27 Junliang Guo , Xu Tan , Di He , Tao Qin , Linli Xu , Tie-Yan Liu

DecBERT: Enhancing the Language Understanding of BERT with Causal Attention Masks

Since 2017, the Transformer-based models play critical roles in various downstream Natural Language Processing tasks. However, a common limitation of the attention mechanism utilized in Transformer Encoder is that it cannot automatically…

Computation and Language · Computer Science 2022-04-20 Ziyang Luo , Yadong Xi , Jing Ma , Zhiwei Yang , Xiaoxi Mao , Changjie Fan , Rongsheng Zhang

BERTSel: Answer Selection with Pre-trained Models

Recently, pre-trained models have been the dominant paradigm in natural language processing. They achieved remarkable state-of-the-art performance across a wide range of related tasks, such as textual entailment, natural language inference,…

Computation and Language · Computer Science 2019-05-21 Dongfang Li , Yifei Yu , Qingcai Chen , Xinyu Li

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them…

Computation and Language · Computer Science 2021-06-01 Jin Xu , Xu Tan , Renqian Luo , Kaitao Song , Jian Li , Tao Qin , Tie-Yan Liu

Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks. By probing and…

Computation and Language · Computer Science 2021-02-23 Tingyu Xia , Yue Wang , Yuan Tian , Yi Chang

E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce

Pre-trained language models such as BERT have achieved great success in a broad range of natural language processing tasks. However, BERT cannot well support E-commerce related tasks due to the lack of two levels of domain knowledge, i.e.,…

Computation and Language · Computer Science 2021-12-20 Denghui Zhang , Zixuan Yuan , Yanchi Liu , Fuzhen Zhuang , Haifeng Chen , Hui Xiong

Robust Transfer Learning with Pretrained Language Models through Adapters

Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those large language models on downstream tasks or combining it with task-specific…

Computation and Language · Computer Science 2021-08-06 Wenjuan Han , Bo Pang , Yingnian Wu