Context-PEFT: efficient multi-modal, multi-task fine-tuning

Avelina Asada Hadji-Kyriacou, Ognjen Arandjelovic

Research output: Working paperPreprint

Abstract

This paper introduces a novel Parameter-Efficient Fine-Tuning (PEFT) framework for multi-modal, multi-task transfer learning with pre-trained language models. PEFT techniques such as LoRA, BitFit and IA3 have demonstrated comparable performance to full fine-tuning of pre-trained models for specific downstream tasks, all while demanding significantly fewer trainable parameters and reduced GPU memory consumption. However, in the context of multi-modal fine-tuning, the need for architectural modifications or full fine-tuning often becomes apparent. To address this we propose Context-PEFT, which learns different groups of adaptor parameters based on the token's domain. This approach enables LoRA-like weight injection without requiring additional architectural changes. Our method is evaluated on the COCO captioning task, where it outperforms full fine-tuning under similar data constraints while simultaneously offering a substantially more parameter-efficient and computationally economical solution.
Original languageEnglish
PublisherarXiv
Pages1-13
Number of pages13
Publication statusPublished - 14 Dec 2023

Fingerprint

Dive into the research topics of 'Context-PEFT: efficient multi-modal, multi-task fine-tuning'. Together they form a unique fingerprint.

Cite this