English
Related papers

Related papers: Multi-Stage Influence Function

200 papers

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is…

Computation and Language · Computer Science 2021-01-28 Armen Aghajanyan , Anchit Gupta , Akshat Shrivastava , Xilun Chen , Luke Zettlemoyer , Sonal Gupta

Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks. Since the majority of knowledge is acquired during pre-training, attributing the predictions of fine-tuned LLMs to their pre-training data may…

Computation and Language · Computer Science 2026-02-09 Yuntai Bao , Xuhong Zhang , Tianyu Du , Xinkui Zhao , Jiang Zong , Hao Peng , Jianwei Yin

The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task, becoming commonplace across many areas of machine learning. While pretraining is empirically observed to be…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Gabriele Merlin , Vedant Nanda , Ruchit Rawal , Mariya Toneva

Large language models are classically trained in stages: pretraining on raw text followed by post-training for instruction following and reasoning. However, this separation creates a fundamental limitation: many desirable behaviors such as…

The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or…

Computation and Language · Computer Science 2025-03-19 Kaiser Sun , Mark Dredze

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current…

Machine Learning · Computer Science 2024-06-21 Myeongseob Ko , Feiyang Kang , Weiyan Shi , Ming Jin , Zhou Yu , Ruoxi Jia

Machine learning systems such as large scale recommendation systems or natural language processing systems are usually trained on billions of training points and are associated with hundreds of billions or trillions of parameters. Improving…

Machine Learning · Computer Science 2023-05-26 Michael Kounavis , Ousmane Dia , Ilqar Ramazanli

Modern ML systems ingest data aggregated from diverse sources, such as synthetic, human-annotated, and live customer traffic. Understanding \textit{which} examples are important to the performance of a learning algorithm is crucial for…

Machine Learning · Computer Science 2023-11-29 Nikhil Anand , Joshua Tan , Maria Minakova

Most uses of machine learning today involve training a model from scratch for a particular task, or sometimes starting with a model pretrained on a related task and then fine-tuning on a downstream task. Both approaches offer limited…

Machine Learning · Computer Science 2022-05-26 Andrea Gesmundo , Jeff Dean

Many language tasks (e.g., Named Entity Recognition, Part-of-Speech tagging, and Semantic Role Labeling) are naturally framed as sequence tagging problems. However, there has been comparatively little work on interpretability methods for…

Computation and Language · Computer Science 2022-10-26 Sarthak Jain , Varun Manjunatha , Byron C. Wallace , Ani Nenkova

Parameter fine tuning is a transfer learning approach whereby learned parameters from pre-trained source network are transferred to the target network followed by fine-tuning. Prior research has shown that this approach is capable of…

Computer Vision and Pattern Recognition · Computer Science 2019-09-20 Tasfia Shermin , Shyh Wei Teng , Manzur Murshed , Guojun Lu , Ferdous Sohel , Manoranjan Paul

Recent advancements in code large language models (Code-LLMs) have demonstrated remarkable capabilities in resolving programming related tasks. Meanwhile, researchers have recognized that the quality of pre-training data is crucial for…

Software Engineering · Computer Science 2026-04-10 Chengli Xing , Zhengran Zeng , Gexiang Fang , Rui Xie , Wei Ye , Shikun Zhang

Pre-trained models have been shown effective in many code intelligence tasks. These models are pre-trained on large-scale unlabeled corpus and then fine-tuned in downstream tasks. However, as the inputs to pre-training and downstream tasks…

Software Engineering · Computer Science 2022-07-26 Chaozheng Wang , Yuanhang Yang , Cuiyun Gao , Yun Peng , Hongyu Zhang , Michael R. Lyu

Prompting has recently been shown as a promising approach for applying pre-trained language models to perform downstream tasks. We present Multi-Stage Prompting (MSP), a simple and automatic approach for leveraging pre-trained language…

Computation and Language · Computer Science 2022-03-18 Zhixing Tan , Xiangwen Zhang , Shuo Wang , Yang Liu

Foundation models trained on web-scale data have revolutionized robotics, but their application to low-level control remains largely limited to behavioral cloning. Drawing inspiration from the success of the reinforcement learning stage in…

Machine Learning · Computer Science 2025-09-19 Seyed Kamyar Seyed Ghasemipour , Ayzaan Wahid , Jonathan Tompson , Pannag Sanketi , Igor Mordatch

As model finetuning is central to the modern NLP, we set to maximize its efficiency. Motivated by redundancy in training examples and the sheer sizes of pretrained models, we exploit a key opportunity: training only on important data. To…

Computation and Language · Computer Science 2023-05-22 Xu Ouyang , Shahina Mohd Azam Ansari , Felix Xiaozhu Lin , Yangfeng Ji

Understanding the process of learning in neural networks is crucial for improving their performance and interpreting their behavior. This can be approximately understood by asking how a model's output is influenced when we fine-tune on a…

Machine Learning · Computer Science 2024-06-04 Jordan K. Matelsky , Lyle Ungar , Konrad P. Kording

Language models (LMs) trained on vast quantities of unlabelled data have greatly advanced the field of natural language processing (NLP). In this study, we re-visit the widely accepted notion in NLP that continued pre-training LMs on…

Computation and Language · Computer Science 2023-10-09 Zhengxiang Shi , Aldo Lipani

Continual pre-training has been urgent for adapting a pre-trained model to a multitude of domains and tasks in the fast-evolving world. In practice, a continually pre-trained model is expected to demonstrate not only greater capacity when…

Computation and Language · Computer Science 2023-10-23 Gangwei Jiang , Caigao Jiang , Siqiao Xue , James Y. Zhang , Jun Zhou , Defu Lian , Ying Wei

State-of-the-art cross-encoders can be fine-tuned to be highly effective in passage re-ranking. The typical fine-tuning process of cross-encoders as re-rankers requires large amounts of manually labelled data, a contrastive learning…

Information Retrieval · Computer Science 2025-03-31 Francesca Pezzuti , Sean MacAvaney , Nicola Tonellotto
‹ Prev 1 2 3 10 Next ›