Related papers: Muppet: Massive Multi-task Representations with Pr…

How fine can fine-tuning be? Learning efficient language models

State-of-the-art performance on language understanding tasks is now achieved with increasingly large networks; the current record holder has billions of parameters. Given a language model pre-trained on massive unlabeled text corpora, only…

Computation and Language · Computer Science 2020-04-30 Evani Radiya-Dixit , Xin Wang

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning

Recent work demonstrates the potential of multilingual pretraining of creating one model that can be used for various tasks in different languages. Previous work in multilingual pretraining has demonstrated that machine translation systems…

Computation and Language · Computer Science 2020-08-04 Yuqing Tang , Chau Tran , Xian Li , Peng-Jen Chen , Naman Goyal , Vishrav Chaudhary , Jiatao Gu , Angela Fan

Exploring the Benefits of Training Expert Language Models over Instruction Tuning

Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known as multitask-prompted fine-tuning (MT), have shown the capability to generalize to unseen tasks. Previous work has shown that scaling the number of training…

Computation and Language · Computer Science 2023-02-10 Joel Jang , Seungone Kim , Seonghyeon Ye , Doyoung Kim , Lajanugen Logeswaran , Moontae Lee , Kyungjae Lee , Minjoon Seo

Amuro and Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models

The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or…

Computation and Language · Computer Science 2025-03-19 Kaiser Sun , Mark Dredze

Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner

Language models (LMs) trained on vast quantities of unlabelled data have greatly advanced the field of natural language processing (NLP). In this study, we re-visit the widely accepted notion in NLP that continued pre-training LMs on…

Computation and Language · Computer Science 2023-10-09 Zhengxiang Shi , Aldo Lipani

How to Fine-Tune BERT for Text Classification?

Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT (Bidirectional Encoder Representations from Transformers) has achieved amazing…

Computation and Language · Computer Science 2020-02-06 Chi Sun , Xipeng Qiu , Yige Xu , Xuanjing Huang

muNet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems

Most uses of machine learning today involve training a model from scratch for a particular task, or sometimes starting with a model pretrained on a related task and then fine-tuning on a downstream task. Both approaches offer limited…

Machine Learning · Computer Science 2022-05-26 Andrea Gesmundo , Jeff Dean

Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?

While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target…

Computation and Language · Computer Science 2020-05-12 Yada Pruksachatkun , Jason Phang , Haokun Liu , Phu Mon Htut , Xiaoyi Zhang , Richard Yuanzhe Pang , Clara Vania , Katharina Kann , Samuel R. Bowman

What to Pre-Train on? Efficient Intermediate Task Selection

Intermediate task fine-tuning has been shown to culminate in large transfer gains across many NLP tasks. With an abundance of candidate datasets as well as pre-trained language models, it has become infeasible to run the cross-product of…

Computation and Language · Computer Science 2021-09-13 Clifton Poth , Jonas Pfeiffer , Andreas Rücklé , Iryna Gurevych

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training. Due to the cost of applying such models to…

Computation and Language · Computer Science 2019-09-27 Iulia Turc , Ming-Wei Chang , Kenton Lee , Kristina Toutanova

ALERT: Adapting Language Models to Reasoning Tasks

Current large language models can perform reasonably well on complex tasks that require step-by-step reasoning with few-shot learning. Are these models applying reasoning skills they have learnt during pre-training and reason outside of…

Computation and Language · Computer Science 2023-10-02 Ping Yu , Tianlu Wang , Olga Golovneva , Badr AlKhamissi , Siddharth Verma , Zhijing Jin , Gargi Ghosh , Mona Diab , Asli Celikyilmaz

Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How

With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset. In this paper, we propose a methodology that jointly…

Machine Learning · Computer Science 2024-02-26 Sebastian Pineda Arango , Fabio Ferreira , Arlind Kadra , Frank Hutter , Josif Grabocka

Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

Recently, fine-tuning pre-trained language models (e.g., multilingual BERT) to downstream cross-lingual tasks has shown promising results. However, the fine-tuning process inevitably changes the parameters of the pre-trained model and…

Computation and Language · Computer Science 2020-10-06 Zihan Liu , Genta Indra Winata , Andrea Madotto , Pascale Fung

Multi-Stage Influence Function

Multi-stage training and knowledge transfer, from a large-scale pretraining task to various finetuning tasks, have revolutionized natural language processing and computer vision resulting in state-of-the-art performance improvements. In…

Machine Learning · Computer Science 2020-07-20 Hongge Chen , Si Si , Yang Li , Ciprian Chelba , Sanjiv Kumar , Duane Boning , Cho-Jui Hsieh

Rethinking Why Intermediate-Task Fine-Tuning Works

Supplementary Training on Intermediate Labeled-data Tasks (STILTs) is a widely applied technique, which first fine-tunes the pretrained language models on an intermediate task before on the target task of interest. While STILTs is able to…

Computation and Language · Computer Science 2021-09-02 Ting-Yun Chang , Chi-Jen Lu

MML: Maximal Multiverse Learning for Robust Fine-Tuning of Language Models

Recent state-of-the-art language models utilize a two-phase training procedure comprised of (i) unsupervised pre-training on unlabeled text, and (ii) fine-tuning for a specific supervised task. More recently, many studies have been focused…

Computation and Language · Computer Science 2019-11-15 Itzik Malkiel , Lior Wolf

Visualizing and Understanding the Effectiveness of BERT

Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks. However, it is unclear why the pre-training-then-fine-tuning paradigm can improve performance and generalization capability across different…

Computation and Language · Computer Science 2019-08-16 Yaru Hao , Li Dong , Furu Wei , Ke Xu

How Fine-Tuning Allows for Effective Meta-Learning

Representation learning has been widely studied in the context of meta-learning, enabling rapid learning of new tasks through shared representations. Recent works such as MAML have explored using fine-tuning-based metrics, which measure the…

Machine Learning · Computer Science 2021-05-06 Kurtland Chua , Qi Lei , Jason D. Lee

Self-Improving Pretraining: using post-trained models to pretrain better models

Large language models are classically trained in stages: pretraining on raw text followed by post-training for instruction following and reasoning. However, this separation creates a fundamental limitation: many desirable behaviors such as…

Computation and Language · Computer Science 2026-04-07 Ellen Xiaoqing Tan , Jack Lanchantin , Shehzaad Dhuliawala , Danwei Li , Thao Nguyen , Jing Xu , Ping Yu , Ilia Kulikov , Sainbayar Sukhbaatar , Jason Weston , Xian Li , Olga Golovneva

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

One reason pretraining on self-supervised linguistic tasks is effective is that it teaches models features that are helpful for language understanding. However, we want pretrained models to learn not only to represent linguistic features,…

Computation and Language · Computer Science 2020-10-13 Alex Warstadt , Yian Zhang , Haau-Sing Li , Haokun Liu , Samuel R. Bowman