Related papers: Structural Self-Supervised Objectives for Transfor…

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced…

Computation and Language · Computer Science 2021-09-07 Atsuki Yamaguchi , George Chrysostomou , Katerina Margatina , Nikolaos Aletras

Exploring Unsupervised Pretraining Objectives for Machine Translation

Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence…

Computation and Language · Computer Science 2021-06-11 Christos Baziotis , Ivan Titov , Alexandra Birch , Barry Haddow

Automating Code-Related Tasks Through Transformers: The Impact of Pre-training

Transformers have gained popularity in the software engineering (SE) literature. These deep learning models are usually pre-trained through a self-supervised objective, meant to provide the model with basic knowledge about a language of…

Software Engineering · Computer Science 2023-02-09 Rosalia Tufano , Luca Pascarella , Gabriele Bavota

Benchmarking down-scaled (not so large) pre-trained language models

Large Transformer-based language models are pre-trained on corpora of varying sizes, for a different number of steps and with different batch sizes. At the same time, more fundamental components, such as the pre-training objective or…

Computation and Language · Computer Science 2021-05-12 M. Aßenmacher , P. Schulze , C. Heumann

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens. While they produce good results when transferred to…

Computation and Language · Computer Science 2020-03-25 Kevin Clark , Minh-Thang Luong , Quoc V. Le , Christopher D. Manning

Cross-Lingual Supervision improves Large Language Models Pre-training

The recent rapid progress in pre-training Large Language Models has relied on using self-supervised language modeling objectives like next token prediction or span corruption. On the other hand, Machine Translation Systems are mostly…

Computation and Language · Computer Science 2023-05-22 Andrea Schioppa , Xavier Garcia , Orhan Firat

SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training

Unsupervised pre-training is now the predominant approach for both text and speech understanding. Self-attention models pre-trained on large amounts of unannotated data have been hugely successful when fine-tuned on downstream tasks from a…

Computation and Language · Computer Science 2021-10-22 Ankur Bapna , Yu-an Chung , Nan Wu , Anmol Gulati , Ye Jia , Jonathan H. Clark , Melvin Johnson , Jason Riesa , Alexis Conneau , Yu Zhang

How does the pre-training objective affect what large language models learn about linguistic properties?

Several pre-training objectives, such as masked language modeling (MLM), have been proposed to pre-train language models (e.g. BERT) with the aim of learning better language representations. However, to the best of our knowledge, no…

Computation and Language · Computer Science 2022-03-22 Ahmed Alajrami , Nikolaos Aletras

On Losses for Modern Language Models

BERT set many state-of-the-art results over varied NLU benchmarks by pre-training over two tasks: masked language modelling (MLM) and next sentence prediction (NSP), the latter of which has been highly criticized. In this paper, we 1)…

Computation and Language · Computer Science 2020-10-06 Stephane Aroca-Ouellette , Frank Rudzicz

MST: Masked Self-Supervised Transformer for Visual Representation

Transformer has been widely used for self-supervised pre-training in Natural Language Processing (NLP) and achieved great success. However, it has not been fully explored in visual self-supervised learning. Meanwhile, previous methods only…

Computer Vision and Pattern Recognition · Computer Science 2021-10-26 Zhaowen Li , Zhiyang Chen , Fan Yang , Wei Li , Yousong Zhu , Chaoyang Zhao , Rui Deng , Liwei Wu , Rui Zhao , Ming Tang , Jinqiao Wang

Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling, which serves the ultimate purpose of pre-trained language models (PrLMs), generalizing well on a mass of scenarios.…

Computation and Language · Computer Science 2022-10-20 Hongqiu Wu , Ruixue Ding , Hai Zhao , Boli Chen , Pengjun Xie , Fei Huang , Min Zhang

A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives

Pretrained language models (PLMs) display impressive performances and have captured the attention of the NLP community. Establishing best practices in pretraining has, therefore, become a major focus of NLP research, especially since…

Computation and Language · Computer Science 2024-10-08 Zihao Li , Shaoxiong Ji , Timothee Mickus , Vincent Segonne , Jörg Tiedemann

SLM: Learning a Discourse Language Representation with Sentence Unshuffling

We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation in a fully self-supervised manner. Recent pre-training methods in NLP focus on learning either bottom or top-level…

Computation and Language · Computer Science 2020-11-02 Haejun Lee , Drew A. Hudson , Kangwook Lee , Christopher D. Manning

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Self-supervised pre-training of transformer models has revolutionized NLP applications. Such pre-training with language modeling objectives provides a useful initial point for parameters that generalize well to new tasks with fine-tuning.…

Computation and Language · Computer Science 2020-11-17 Trapit Bansal , Rishikesh Jha , Tsendsuren Munkhdalai , Andrew McCallum

Data Efficient Masked Language Modeling for Vision and Language

Masked language modeling (MLM) is one of the key sub-tasks in vision-language pretraining. In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text. In…

Computation and Language · Computer Science 2021-09-07 Yonatan Bitton , Gabriel Stanovsky , Michael Elhadad , Roy Schwartz

Using Selective Masking as a Bridge between Pre-training and Fine-tuning

Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of-the-art results for various NLP tasks. Pre-training is usually independent of the downstream task, and previous works have shown that this…

Computation and Language · Computer Science 2022-11-28 Tanish Lad , Himanshu Maheshwari , Shreyas Kottukkal , Radhika Mamidi

PERT: Pre-training BERT with Permuted Language Model

Pre-trained Language Models (PLMs) have been widely used in various natural language processing (NLP) tasks, owing to their powerful text representations trained on large-scale corpora. In this paper, we propose a new PLM called PERT for…

Computation and Language · Computer Science 2022-03-15 Yiming Cui , Ziqing Yang , Ting Liu

Position Prediction as an Effective Pretraining Strategy

Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing…

Machine Learning · Computer Science 2022-07-18 Shuangfei Zhai , Navdeep Jaitly , Jason Ramapuram , Dan Busbridge , Tatiana Likhomanenko , Joseph Yitan Cheng , Walter Talbott , Chen Huang , Hanlin Goh , Joshua Susskind

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Transformer-based pretrained language models (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. These models are built on the top of transformers, self-supervised…

Computation and Language · Computer Science 2021-08-31 Katikapalli Subramanyam Kalyan , Ajit Rajasekharan , Sivanesan Sangeetha

SimpleBERT: A Pre-trained Model That Learns to Generate Simple Words

Pre-trained models are widely used in the tasks of natural language processing nowadays. However, in the specific field of text simplification, the research on improving pre-trained models is still blank. In this work, we propose a…

Computation and Language · Computer Science 2022-04-19 Renliang Sun , Xiaojun Wan