English

Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning

Machine Learning 2023-12-15 v1

Abstract

This paper introduces a novel Parameter-Efficient Fine-Tuning (PEFT) framework for multi-modal, multi-task transfer learning with pre-trained language models. PEFT techniques such as LoRA, BitFit and IA3 have demonstrated comparable performance to full fine-tuning of pre-trained models for specific downstream tasks, all while demanding significantly fewer trainable parameters and reduced GPU memory consumption. However, in the context of multi-modal fine-tuning, the need for architectural modifications or full fine-tuning often becomes apparent. To address this we propose Context-PEFT, which learns different groups of adaptor parameters based on the token's domain. This approach enables LoRA-like weight injection without requiring additional architectural changes. Our method is evaluated on the COCO captioning task, where it outperforms full fine-tuning under similar data constraints while simultaneously offering a substantially more parameter-efficient and computationally economical solution.

Keywords

Cite

@article{arxiv.2312.08900,
  title  = {Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning},
  author = {Avelina Asada Hadji-Kyriacou and Ognjen Arandjelovic},
  journal= {arXiv preprint arXiv:2312.08900},
  year   = {2023}
}