English
Related papers

Related papers: Multi-task Code LLMs: Data Mix or Model Merge?

200 papers

Large Language Models (LLMs) have been adopted and deployed worldwide for a broad variety of applications. However, ensuring their safe use remains a significant challenge. Preference training and safety measures often overfit to harms…

Computation and Language · Computer Science 2024-10-15 Aakanksha , Arash Ahmadian , Seraphina Goldfarb-Tarrant , Beyza Ermis , Marzieh Fadaee , Sara Hooker

Model merging aggregates Large Language Models (LLMs) finetuned on different tasks into a stronger one. However, parameter conflicts between models leads to performance degradation in averaging. While model routing addresses this issue by…

Machine Learning · Computer Science 2025-02-12 Kunfeng Lai , Zhenheng Tang , Xinglin Pan , Peijie Dong , Xiang Liu , Haolan Chen , Li Shen , Bo Li , Xiaowen Chu

Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding. It is still challenging to build a code LLM with comprehensive performance yet ultimate efficiency.…

Model merging provides a scalable alternative to multi-task training by combining specialized finetuned models through parameter arithmetic, enabling efficient deployment without the need for joint training or access to all task data. While…

Machine Learning · Computer Science 2025-10-21 Yifei He , Siqi Zeng , Yuzheng Hu , Rui Yang , Tong Zhang , Han Zhao

Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring…

Machine Learning · Computer Science 2025-05-23 Zhixu Silvia Tao , Kasper Vinken , Hao-Wei Yeh , Avi Cooper , Xavier Boix

Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse natural language processing (NLP) tasks. The release of open-source LLMs like LLaMA and Qwen has triggered the development of numerous fine-tuned models…

Computation and Language · Computer Science 2025-06-17 Zichuan Fu , Xian Wu , Yejing Wang , Wanyu Wang , Shanshan Ye , Hongzhi Yin , Yi Chang , Yefeng Zheng , Xiangyu Zhao

Recent advancement in code understanding and generation demonstrates that code LLMs fine-tuned on a high-quality instruction dataset can gain powerful capabilities to address wide-ranging code-related tasks. However, most previous existing…

Computation and Language · Computer Science 2025-02-12 Jian Yang , Wei Zhang , Jiaxi Yang , Yibo Miao , Shanghaoran Quan , Zhenhe Wu , Qiyao Peng , Liqun Yang , Tianyu Liu , Zeyu Cui , Binyuan Hui , Junyang Lin

Recent advances in large language models have led to numerous task-specialized fine-tuned variants, creating a need for efficient model merging techniques that preserve specialized capabilities while avoiding costly retraining. While…

Computation and Language · Computer Science 2025-02-20 Shuqi Liu , Han Wu , Bowei He , Xiongwei Han , Mingxuan Yuan , Linqi Song

Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to…

Machine Learning · Computer Science 2023-11-07 Bingchang Liu , Chaoyu Chen , Cong Liao , Zi Gong , Huan Wang , Zhichao Lei , Ming Liang , Dajun Chen , Min Shen , Hailian Zhou , Hang Yu , Jianguo Li

Large language models (LLMs) have enabled the development of numerous specialized, task-specific variants. However, the maintenance and deployment of these individual models present substantial challenges in terms of resource utilization…

Machine Learning · Computer Science 2024-11-04 Quy-Anh Dang , Chris Ngo

Selecting the best data mixture is critical for successful Supervised Fine-Tuning (SFT) of Multimodal Large Language Models. However, determining the optimal mixture weights across multiple domain-specific datasets remains a significant…

Machine Learning · Computer Science 2026-02-06 Davide Berasi , Matteo Farina , Massimiliano Mancini , Elisa Ricci

Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness, Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI. Existing methods like data mixture strategies face limitations,…

Computation and Language · Computer Science 2026-02-03 Jinluan Yang , Dingnan Jin , Anke Tang , Li Shen , Didi Zhu , Zhengyu Chen , Ziyu Zhao , Daixin Wang , Qing Cui , Zhiqiang Zhang , Jun Zhou , Fei Wu , Kun Kuang

Optimizing data mixtures is essential for unlocking the full potential of large language models (LLMs), yet identifying the optimal composition remains computationally prohibitive due to reliance on heuristic trials or expensive proxy…

Machine Learning · Computer Science 2026-01-27 Jiapeng Wang , Changxin Tian , Kunlong Chen , Ziqi Liu , Jiaxin Mao , Wayne Xin Zhao , Zhiqiang Zhang , Jun Zhou

Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task model, but their applicability is reduced by significant performance loss. Previous works have…

Machine Learning · Computer Science 2024-05-14 Ke Wang , Nikolaos Dimitriadis , Guillermo Ortiz-Jimenez , François Fleuret , Pascal Frossard

Foundation models update slowly due to resource-intensive training, whereas domain-specific models evolve rapidly between releases. Model merging seeks to combine multiple expert models into a single, more capable model, reducing storage…

Artificial Intelligence · Computer Science 2026-03-04 Yongxian Wei , Runxi Cheng , Weike Jin , Enneng Yang , Li Shen , Lu Hou , Sinan Du , Chun Yuan , Xiaochun Cao , Dacheng Tao

Multi-task learning (MTL) concurrently trains a model on diverse task datasets to exploit common features, thereby improving overall performance across the tasks. Recent studies have dedicated efforts to merging multiple independent model…

Machine Learning · Computer Science 2025-06-16 Bingjie Zhang , Hongkang Li , Changlong Shi , Guowei Rong , He Zhao , Dongsheng Wang , Dandan Guo , Meng Wang

Lately, the practice of utilizing task-specific fine-tuning has been implemented to improve the performance of large language models (LLM) in subsequent tasks. Through the integration of diverse LLMs, the overall competency of LLMs is…

Computation and Language · Computer Science 2024-12-23 Mingyang Zhang , Jing Liu , Ganggui Ding , Xinyi Yu , Linlin Ou , Bohan Zhuang

Merging methods combine the weights of multiple language models (LMs) to leverage their capacities, such as for domain adaptation. While existing studies investigate merged models from a solely behavioral perspective, we offer the first…

Computation and Language · Computer Science 2025-12-16 Yutaro Sigrist , Andreas Waldis

Model merging combines the parameters of multiple neural networks into a single model without additional training. As fine-tuned large language models (LLMs) proliferate, merging offers a computationally efficient alternative to ensembles…

Computation and Language · Computer Science 2026-03-31 Mingyang Song , Mao Zheng

Multi-task model merging aims to consolidate knowledge from multiple fine-tuned task-specific experts into a unified model while minimizing performance degradation. Existing methods primarily approach this by minimizing differences between…

Machine Learning · Computer Science 2025-10-28 Wenju Sun , Qingyong Li , Wen Wang , Yang Liu , Yangli-ao Geng , Boyang Li
‹ Prev 1 2 3 10 Next ›