English
Related papers

Related papers: Muppet: Massive Multi-task Representations with Pr…

200 papers

State-of-the-art performance on language understanding tasks is now achieved with increasingly large networks; the current record holder has billions of parameters. Given a language model pre-trained on massive unlabeled text corpora, only…

Computation and Language · Computer Science 2020-04-30 Evani Radiya-Dixit , Xin Wang

Recent work demonstrates the potential of multilingual pretraining of creating one model that can be used for various tasks in different languages. Previous work in multilingual pretraining has demonstrated that machine translation systems…

Computation and Language · Computer Science 2020-08-04 Yuqing Tang , Chau Tran , Xian Li , Peng-Jen Chen , Naman Goyal , Vishrav Chaudhary , Jiatao Gu , Angela Fan

Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known as multitask-prompted fine-tuning (MT), have shown the capability to generalize to unseen tasks. Previous work has shown that scaling the number of training…

Computation and Language · Computer Science 2023-02-10 Joel Jang , Seungone Kim , Seonghyeon Ye , Doyoung Kim , Lajanugen Logeswaran , Moontae Lee , Kyungjae Lee , Minjoon Seo

The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or…

Computation and Language · Computer Science 2025-03-19 Kaiser Sun , Mark Dredze

Language models (LMs) trained on vast quantities of unlabelled data have greatly advanced the field of natural language processing (NLP). In this study, we re-visit the widely accepted notion in NLP that continued pre-training LMs on…

Computation and Language · Computer Science 2023-10-09 Zhengxiang Shi , Aldo Lipani

Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing…

Computation and Language · Computer Science 2020-02-06 Chi Sun , Xipeng Qiu , Yige Xu , Xuanjing Huang

Most uses of machine learning today involve training a model from scratch for a particular task, or sometimes starting with a model pretrained on a related task and then fine-tuning on a downstream task. Both approaches offer limited…

Machine Learning · Computer Science 2022-05-26 Andrea Gesmundo , Jeff Dean

While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target…

Intermediate task fine-tuning has been shown to culminate in large transfer gains across many NLP tasks. With an abundance of candidate datasets as well as pre-trained language models, it has become infeasible to run the cross-product of…

Computation and Language · Computer Science 2021-09-13 Clifton Poth , Jonas Pfeiffer , Andreas Rücklé , Iryna Gurevych

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training. Due to the cost of applying such models to…

Computation and Language · Computer Science 2019-09-27 Iulia Turc , Ming-Wei Chang , Kenton Lee , Kristina Toutanova

Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning. Are these models applying reasoning skills they have learnt during pre-training and reason outside of…

Computation and Language · Computer Science 2023-10-02 Ping Yu , Tianlu Wang , Olga Golovneva , Badr AlKhamissi , Siddharth Verma , Zhijing Jin , Gargi Ghosh , Mona Diab , Asli Celikyilmaz

With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset. In this paper, we propose a methodology that jointly…

Machine Learning · Computer Science 2024-02-26 Sebastian Pineda Arango , Fabio Ferreira , Arlind Kadra , Frank Hutter , Josif Grabocka

Recently, fine-tuning pre-trained language models (e.g., multilingual BERT) to downstream cross-lingual tasks has shown promising results. However, the fine-tuning process inevitably changes the parameters of the pre-trained model and…

Computation and Language · Computer Science 2020-10-06 Zihan Liu , Genta Indra Winata , Andrea Madotto , Pascale Fung

Multi-stage training and knowledge transfer, from a large-scale pretraining task to various finetuning tasks, have revolutionized natural language processing and computer vision resulting in state-of-the-art performance improvements. In…

Machine Learning · Computer Science 2020-07-20 Hongge Chen , Si Si , Yang Li , Ciprian Chelba , Sanjiv Kumar , Duane Boning , Cho-Jui Hsieh

Supplementary Training on Intermediate Labeled-data Tasks (STILTs) is a widely applied technique, which first fine-tunes the pretrained language models on an intermediate task before on the target task of interest. While STILTs is able to…

Computation and Language · Computer Science 2021-09-02 Ting-Yun Chang , Chi-Jen Lu

Recent state-of-the-art language models utilize a two-phase training procedure comprised of (i) unsupervised pre-training on unlabeled text, and (ii) fine-tuning for a specific supervised task. More recently, many studies have been focused…

Computation and Language · Computer Science 2019-11-15 Itzik Malkiel , Lior Wolf

Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks. However, it is unclear why the pre-training-then-fine-tuning paradigm can improve performance and generalization capability across different…

Computation and Language · Computer Science 2019-08-16 Yaru Hao , Li Dong , Furu Wei , Ke Xu

Representation learning has been widely studied in the context of meta-learning, enabling rapid learning of new tasks through shared representations. Recent works such as MAML have explored using fine-tuning-based metrics, which measure the…

Machine Learning · Computer Science 2021-05-06 Kurtland Chua , Qi Lei , Jason D. Lee

Large language models are classically trained in stages: pretraining on raw text followed by post-training for instruction following and reasoning. However, this separation creates a fundamental limitation: many desirable behaviors such as…

One reason pretraining on self-supervised linguistic tasks is effective is that it teaches models features that are helpful for language understanding. However, we want pretrained models to learn not only to represent linguistic features,…

Computation and Language · Computer Science 2020-10-13 Alex Warstadt , Yian Zhang , Haau-Sing Li , Haokun Liu , Samuel R. Bowman
‹ Prev 1 2 3 10 Next ›