Related papers: Multi-Stage Influence Function

Muppet: Massive Multi-task Representations with Pre-Finetuning

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is…

Computation and Language · Computer Science 2021-01-28 Armen Aghajanyan , Anchit Gupta , Akshat Shrivastava , Xilun Chen , Luke Zettlemoyer , Sonal Gupta

Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization

Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks. Since the majority of knowledge is acquired during pre-training, attributing the predictions of fine-tuned LLMs to their pre-training data may…

Computation and Language · Computer Science 2026-02-09 Yuntai Bao , Xuhong Zhang , Tianyu Du , Xinkui Zhao , Jiang Zong , Hao Peng , Jianwei Yin

What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation

The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task, becoming commonplace across many areas of machine learning. While pretraining is empirically observed to be…

Computer Vision and Pattern Recognition · Computer Science 2023-07-13 Gabriele Merlin , Vedant Nanda , Ruchit Rawal , Mariya Toneva

Self-Improving Pretraining: using post-trained models to pretrain better models

Large language models are classically trained in stages: pretraining on raw text followed by post-training for instruction following and reasoning. However, this separation creates a fundamental limitation: many desirable behaviors such as…

Computation and Language · Computer Science 2026-04-07 Ellen Xiaoqing Tan , Jack Lanchantin , Shehzaad Dhuliawala , Danwei Li , Thao Nguyen , Jing Xu , Ping Yu , Ilia Kulikov , Sainbayar Sukhbaatar , Jason Weston , Xian Li , Olga Golovneva

Amuro and Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models

The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or…

Computation and Language · Computer Science 2025-03-19 Kaiser Sun , Mark Dredze

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current…

Machine Learning · Computer Science 2024-06-21 Myeongseob Ko , Feiyang Kang , Weiyan Shi , Ming Jin , Zhou Yu , Ruoxi Jia

On Influence Functions, Classification Influence, Relative Influence, Memorization and Generalization

Machine learning systems such as large scale recommendation systems or natural language processing systems are usually trained on billions of training points and are associated with hundreds of billions or trillions of parameters. Improving…

Machine Learning · Computer Science 2023-05-26 Michael Kounavis , Ousmane Dia , Ilqar Ramazanli

Influence Scores at Scale for Efficient Language Data Sampling

Modern ML systems ingest data aggregated from diverse sources, such as synthetic, human-annotated, and live customer traffic. Understanding \textit{which} examples are important to the performance of a learning algorithm is crucial for…

Machine Learning · Computer Science 2023-11-29 Nikhil Anand , Joshua Tan , Maria Minakova

muNet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems

Most uses of machine learning today involve training a model from scratch for a particular task, or sometimes starting with a model pretrained on a related task and then fine-tuning on a downstream task. Both approaches offer limited…

Machine Learning · Computer Science 2022-05-26 Andrea Gesmundo , Jeff Dean

Influence Functions for Sequence Tagging Models

Many language tasks (e.g., Named Entity Recognition, Part-of-Speech tagging, and Semantic Role Labeling) are naturally framed as sequence tagging problems. However, there has been comparatively little work on interpretability methods for…

Computation and Language · Computer Science 2022-10-26 Sarthak Jain , Varun Manjunatha , Byron C. Wallace , Ani Nenkova

Enhanced Transfer Learning with ImageNet Trained Classification Layer

Parameter fine tuning is a transfer learning approach whereby learned parameters from pre-trained source network are transferred to the target network followed by fine-tuning. Prior research has shown that this approach is capable of…

Computer Vision and Pattern Recognition · Computer Science 2019-09-20 Tasfia Shermin , Shyh Wei Teng , Manzur Murshed , Guojun Lu , Ferdous Sohel , Manoranjan Paul

An Empirical Study on Influence-Based Pretraining Data Selection for Code Large Language Models

Recent advancements in code large language models (Code-LLMs) have demonstrated remarkable capabilities in resolving programming related tasks. Meanwhile, researchers have recognized that the quality of pre-training data is crucial for…

Software Engineering · Computer Science 2026-04-10 Chengli Xing , Zhengran Zeng , Gexiang Fang , Rui Xie , Wei Ye , Shikun Zhang

No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence

Pre-trained models have been shown effective in many code intelligence tasks. These models are pre-trained on large-scale unlabeled corpus and then fine-tuned in downstream tasks. However, as the inputs to pre-training and downstream tasks…

Software Engineering · Computer Science 2022-07-26 Chaozheng Wang , Yuanhang Yang , Cuiyun Gao , Yun Peng , Hongyu Zhang , Michael R. Lyu

MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better Translators

Prompting has recently been shown as a promising approach for applying pre-trained language models to perform downstream tasks. We present Multi-Stage Prompting (MSP), a simple and automatic approach for leveraging pre-trained language…

Computation and Language · Computer Science 2022-03-18 Zhixing Tan , Xiangwen Zhang , Shuo Wang , Yang Liu

Self-Improving Embodied Foundation Models

Foundation models trained on web-scale data have revolutionized robotics, but their application to low-level control remains largely limited to behavioral cloning. Drawing inspiration from the success of the reinforcement learning stage in…

Machine Learning · Computer Science 2025-09-19 Seyed Kamyar Seyed Ghasemipour , Ayzaan Wahid , Jonathan Tompson , Pannag Sanketi , Igor Mordatch

Efficient NLP Model Finetuning via Multistage Data Filtering

As model finetuning is central to the modern NLP, we set to maximize its efficiency. Motivated by redundancy in training examples and the sheer sizes of pretrained models, we exploit a key opportunity: training only on important data. To…

Computation and Language · Computer Science 2023-05-22 Xu Ouyang , Shahina Mohd Azam Ansari , Felix Xiaozhu Lin , Yangfeng Ji

Empirical influence functions to understand the logic of fine-tuning

Understanding the process of learning in neural networks is crucial for improving their performance and interpreting their behavior. This can be approximately understood by asking how a model's output is influenced when we fine-tune on a…

Machine Learning · Computer Science 2024-06-04 Jordan K. Matelsky , Lyle Ungar , Konrad P. Kording

Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner

Language models (LMs) trained on vast quantities of unlabelled data have greatly advanced the field of natural language processing (NLP). In this study, we re-visit the widely accepted notion in NLP that continued pre-training LMs on…

Computation and Language · Computer Science 2023-10-09 Zhengxiang Shi , Aldo Lipani

Towards Anytime Fine-tuning: Continually Pre-trained Language Models with Hypernetwork Prompt

Continual pre-training has been urgent for adapting a pre-trained model to a multitude of domains and tasks in the fast-evolving world. In practice, a continually pre-trained model is expected to demonstrate not only greater capacity when…

Computation and Language · Computer Science 2023-10-23 Gangwei Jiang , Caigao Jiang , Siqiao Xue , James Y. Zhang , Jun Zhou , Defu Lian , Ying Wei

Exploring the Effectiveness of Multi-stage Fine-tuning for Cross-encoder Re-rankers

State-of-the-art cross-encoders can be fine-tuned to be highly effective in passage re-ranking. The typical fine-tuning process of cross-encoders as re-rankers requires large amounts of manually labelled data, a contrastive learning…

Information Retrieval · Computer Science 2025-03-31 Francesca Pezzuti , Sean MacAvaney , Nicola Tonellotto