English
Related papers

Related papers: Parameter-Efficient Transfer Learning for Audio-Vi…

200 papers

Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements on vision-and-language (V&L) tasks as well as on pure language tasks. However, fine-tuning the entire parameter set of pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2022-03-25 Yi-Lin Sung , Jaemin Cho , Mohit Bansal

Pre-trained vision-language models provide a robust foundation for efficient transfer learning across various downstream tasks. In the field of video action recognition, mainstream approaches often introduce additional modules to capture…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Haoxing Chen , Zizheng Huang , Yan Hong , Yanshuo Wang , Zhongcai Lyu , Zhuoer Xu , Jun Lan , Zhangxuan Gu

Capitalizing on large pre-trained models for various downstream tasks of interest have recently emerged with promising performance. Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Junting Pan , Ziyi Lin , Xiatian Zhu , Jing Shao , Hongsheng Li

Fine-tuning of self-supervised models is a powerful transfer learning method in a variety of fields, including speech processing, since it can utilize generic feature representations obtained from large amounts of unlabeled data.…

Multimedia · Computer Science 2022-12-07 Shinta Otake , Rei Kawakami , Nakamasa Inoue

Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters of a model and injecting lightweight modules between layers, resulting in the addition of…

Computation and Language · Computer Science 2021-07-14 Hang Le , Juan Pino , Changhan Wang , Jiatao Gu , Didier Schwab , Laurent Besacier

Vision-language retrieval is an important multi-modal learning topic, where the goal is to retrieve the most relevant visual candidate for a given text query. Recently, pre-trained models, e.g., CLIP, show great potential on retrieval…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Haojun Jiang , Jianke Zhang , Rui Huang , Chunjiang Ge , Zanlin Ni , Shiji Song , Gao Huang

This paper addresses the issues of parameter redundancy, rigid structure, and limited task adaptability in the fine-tuning of large language models. It proposes an adapter-based fine-tuning method built on a structure-learnable mechanism.…

Computation and Language · Computer Science 2025-09-04 Ming Gong , Yingnan Deng , Nia Qi , Yujun Zou , Zhihao Xue , Yun Zi

Point cloud video understanding is critical for robotics as it accurately encodes motion and scene interaction. We recognize that 4D datasets are far scarcer than 3D ones, which hampers the scalability of self-supervised 4D models. A…

Computer Vision and Pattern Recognition · Computer Science 2026-04-29 Yiding Sun , Jihua Zhu , Haozhe Cheng , Chaoyi Lu , Zhichuan Yang , Lin Chen , Yaonan Wang

Large-scale pre-trained models have achieved remarkable success in various computer vision tasks. A standard approach to leverage these models is to fine-tune all model parameters for downstream tasks, which poses challenges in terms of…

Computer Vision and Pattern Recognition · Computer Science 2023-12-18 Yi Xin , Junlong Du , Qiang Wang , Zhiwen Lin , Ke Yan

Adapter-based parameter-efficient transfer learning has achieved exciting results in vision-language models. Traditional adapter methods often require training or fine-tuning, facing challenges such as insufficient samples or resource…

Computer Vision and Pattern Recognition · Computer Science 2024-04-22 Juncheng Yang , Zuchao Li , Shuai Xie , Weiping Zhu , Wei Yu , Shijun Li

Adapting a large language model for multiple-attribute text style transfer via fine-tuning can be challenging due to the significant amount of computational resources and labeled data required for the specific task. In this paper, we…

Computation and Language · Computer Science 2023-05-11 Zhiqiang Hu , Roy Ka-Wei Lee , Nancy F. Chen

Large-scale vision-language pre-trained models have shown promising transferability to various downstream tasks. As the size of these foundation models and the number of downstream tasks grow, the standard full fine-tuning paradigm becomes…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Haoyu Lu , Yuqi Huo , Guoxing Yang , Zhiwu Lu , Wei Zhan , Masayoshi Tomizuka , Mingyu Ding

In the arena of language model fine-tuning, the traditional approaches, such as Domain-Adaptive Pretraining (DAPT) and Task-Adaptive Pretraining (TAPT), although effective, but computational intensive. This research introduces a novel…

Computation and Language · Computer Science 2024-05-10 Keyu Chen , Yuan Pang , Zi Yang

Fine-tuning is a popular method for adapting text-to-speech (TTS) models to new speakers. However this approach has some challenges. Usually fine-tuning requires several hours of high quality speech per speaker. There is also that…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-02 Cheng-Ping Hsieh , Subhankar Ghosh , Boris Ginsburg

Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks. However, most fine-tuning approaches update all the parameters of the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-31 Junyi Peng , Themos Stafylakis , Rongzhi Gu , Oldřich Plchot , Ladislav Mošner , Lukáš Burget , Jan Černocký

Parameter-efficient transfer learning (PETL) methods have emerged as a solid alternative to the standard full fine-tuning approach. They only train a few extra parameters for each downstream task, without sacrificing performance and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-16 Umberto Cappellazzo , Daniele Falavigna , Alessio Brutti , Mirco Ravanelli

Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization. With a growing number of tasks and limited…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Thong Nguyen , Xiaobao Wu , Xinshuai Dong , Khoi Le , Zhiyuan Hu , Cong-Duy Nguyen , See-Kiong Ng , Luu Anh Tuan

Parameter-efficient fine-tuning methods have emerged as a promising solution for adapting pre-trained models to various downstream tasks. While these methods perform well in single-task learning, extending them to multi-task learning…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Neeraj Gangwar , Anshuka Rangi , Rishabh Deshmukh , Holakou Rahmanian , Yesh Dattatreya , Nickvash Kani

How to efficiently transform large language models (LLMs) into instruction followers is recently a popular research direction, while training LLM for multi-modal reasoning remains less explored. Although the recent LLaMA-Adapter…

Computer Vision and Pattern Recognition · Computer Science 2023-05-01 Peng Gao , Jiaming Han , Renrui Zhang , Ziyi Lin , Shijie Geng , Aojun Zhou , Wei Zhang , Pan Lu , Conghui He , Xiangyu Yue , Hongsheng Li , Yu Qiao

The recent success of Transformers in the language domain has motivated adapting it to a multimodal setting, where a new visual model is trained in tandem with an already pretrained language model. However, due to the excessive memory…

Computer Vision and Pattern Recognition · Computer Science 2021-09-23 Sangho Lee , Youngjae Yu , Gunhee Kim , Thomas Breuel , Jan Kautz , Yale Song
‹ Prev 1 2 3 10 Next ›