Related papers: Multi-task Code LLMs: Data Mix or Model Merge?

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

Large Language Models (LLMs) have been adopted and deployed worldwide for a broad variety of applications. However, ensuring their safe use remains a significant challenge. Preference training and safety measures often overfit to harms…

Computation and Language · Computer Science 2024-10-15 Aakanksha , Arash Ahmadian , Seraphina Goldfarb-Tarrant , Beyza Ermis , Marzieh Fadaee , Sara Hooker

Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing

Model merging aggregates Large Language Models (LLMs) finetuned on different tasks into a stronger one. However, parameter conflicts between models leads to performance degradation in averaging. While model routing addresses this issue by…

Machine Learning · Computer Science 2025-02-12 Kunfeng Lai , Zhenheng Tang , Xinglin Pan , Peijie Dong , Xiang Liu , Haolan Chen , Li Shen , Bo Li , Xiaowen Chu

Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM

Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding. It is still challenging to build a code LLM with comprehensive performance yet ultimate efficiency.…

Machine Learning · Computer Science 2025-03-25 Codefuse , Ling Team , : , Wenting Cai , Yuchen Cao , Chaoyu Chen , Chen Chen , Siba Chen , Qing Cui , Peng Di , Junpeng Fang , Zi Gong , Ting Guo , Zhengyu He , Yang Huang , Cong Li , Jianguo Li , Zheng Li , Shijie Lian , BingChang Liu , Songshan Luo , Shuo Mao , Min Shen , Jian Wu , Jiaolong Yang , Wenjie Yang , Tong Ye , Hang Yu , Wei Zhang , Zhenduo Zhang , Hailin Zhao , Xunjin Zheng , Jun Zhou

MergeBench: A Benchmark for Merging Domain-Specialized LLMs

Model merging provides a scalable alternative to multi-task training by combining specialized finetuned models through parameter arithmetic, enabling efficient deployment without the need for joint training or access to all task data. While…

Machine Learning · Computer Science 2025-10-21 Yifei He , Siqi Zeng , Yuzheng Hu , Rui Yang , Tong Zhang , Han Zhao

Merge to Mix: Mixing Datasets via Model Merging

Mixing datasets for fine-tuning large models (LMs) has become critical for maximizing performance on downstream tasks. However, composing effective dataset mixtures typically relies on heuristics and trial-and-error, often requiring…

Machine Learning · Computer Science 2025-05-23 Zhixu Silvia Tao , Kasper Vinken , Hao-Wei Yeh , Avi Cooper , Xavier Boix

Training-free LLM Merging for Multi-task Learning

Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse natural language processing (NLP) tasks. The release of open-source LLMs like LLaMA and Qwen has triggered the development of numerous fine-tuned models…

Computation and Language · Computer Science 2025-06-17 Zichuan Fu , Xian Wu , Yejing Wang , Wanyu Wang , Shanshan Ye , Hongzhi Yin , Yi Chang , Yefeng Zheng , Xiangyu Zhao

Multi-Agent Collaboration for Multilingual Code Instruction Tuning

Recent advancement in code understanding and generation demonstrates that code LLMs fine-tuned on a high-quality instruction dataset can gain powerful capabilities to address wide-ranging code-related tasks. However, most previous existing…

Computation and Language · Computer Science 2025-02-12 Jian Yang , Wei Zhang , Jiaxi Yang , Yibo Miao , Shanghaoran Quan , Zhenhe Wu , Qiyao Peng , Liqun Yang , Tianyu Liu , Zeyu Cui , Binyuan Hui , Junyang Lin

Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models

Recent advances in large language models have led to numerous task-specialized fine-tuned variants, creating a need for efficient model merging techniques that preserve specialized capabilities while avoiding costly retraining. While…

Computation and Language · Computer Science 2025-02-20 Shuqi Liu , Han Wu , Bowei He , Xiongwei Han , Mingxuan Yuan , Linqi Song

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to…

Machine Learning · Computer Science 2023-11-07 Bingchang Liu , Chaoyu Chen , Cong Liao , Zi Gong , Huan Wang , Zhichao Lei , Ming Liang , Dajun Chen , Min Shen , Hailian Zhou , Hang Yu , Jianguo Li

MoD: A Distribution-Based Approach for Merging Large Language Models

Large language models (LLMs) have enabled the development of numerous specialized, task-specific variants. However, the maintenance and deployment of these individual models present substantial challenges in terms of resource utilization…

Machine Learning · Computer Science 2024-11-04 Quy-Anh Dang , Chris Ngo

Linear Model Merging Unlocks Simple and Scalable Multimodal Data Mixture Optimization

Selecting the best data mixture is critical for successful Supervised Fine-Tuning (SFT) of Multimodal Large Language Models. However, determining the optimal mixture weights across multiple domain-specific datasets remains a significant…

Machine Learning · Computer Science 2026-02-06 Davide Berasi , Matteo Farina , Massimiliano Mancini , Elisa Ricci

Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness, Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI. Existing methods like data mixture strategies face limitations,…

Computation and Language · Computer Science 2026-02-03 Jinluan Yang , Dingnan Jin , Anke Tang , Li Shen , Didi Zhu , Zhengyu Chen , Ziyu Zhao , Daixin Wang , Qing Cui , Zhiqiang Zhang , Jun Zhou , Fei Wu , Kun Kuang

MergeMix: Optimizing Mid-Training Data Mixtures via Learnable Model Merging

Optimizing data mixtures is essential for unlocking the full potential of large language models (LLMs), yet identifying the optimal composition remains computationally prohibitive due to reliance on heuristic trials or expensive proxy…

Machine Learning · Computer Science 2026-01-27 Jiapeng Wang , Changxin Tian , Kunlong Chen , Ziqi Liu , Jiaxin Mao , Wayne Xin Zhao , Zhiqiang Zhang , Jun Zhou

Localizing Task Information for Improved Model Merging and Compression

Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task model, but their applicability is reduced by significant performance loss. Previous works have…

Machine Learning · Computer Science 2024-05-14 Ke Wang , Nikolaos Dimitriadis , Guillermo Ortiz-Jimenez , François Fleuret , Pascal Frossard

OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

Foundation models update slowly due to resource-intensive training, whereas domain-specific models evolve rapidly between releases. Model merging seeks to combine multiple expert models into a single, more capable model, reducing storage…

Artificial Intelligence · Computer Science 2026-03-04 Yongxian Wei , Runxi Cheng , Weike Jin , Enneng Yang , Li Shen , Lu Hou , Sinan Du , Chun Yuan , Xiaochun Cao , Dacheng Tao

Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data

Multi-task learning (MTL) concurrently trains a model on diverse task datasets to exploit common features, thereby improving overall performance across the tasks. Recent studies have dedicated efforts to merging multiple independent model…

Machine Learning · Computer Science 2025-06-16 Bingjie Zhang , Hongkang Li , Changlong Shi , Guowei Rong , He Zhao , Dongsheng Wang , Dandan Guo , Meng Wang

Channel Merging: Preserving Specialization for Merged Experts

Lately, the practice of utilizing task-specific fine-tuning has been implemented to improve the performance of large language models (LLM) in subsequent tasks. Through the integration of diverse LLMs, the overall competency of LLMs is…

Computation and Language · Computer Science 2024-12-23 Mingyang Zhang , Jing Liu , Ganggui Ding , Xinyi Yu , Linlin Ou , Bohan Zhuang

A Pipeline to Assess Merging Methods via Behavior and Internals

Merging methods combine the weights of multiple language models (LMs) to leverage their capacities, such as for domain adaptation. While existing studies investigate merged models from a solely behavioral perspective, we offer the first…

Computation and Language · Computer Science 2025-12-16 Yutaro Sigrist , Andreas Waldis

Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

Model merging combines the parameters of multiple neural networks into a single model without additional training. As fine-tuned large language models (LLMs) proliferate, merging offers a computationally efficient alternative to ensembles…

Computation and Language · Computer Science 2026-03-31 Mingyang Song , Mao Zheng

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

Multi-task model merging aims to consolidate knowledge from multiple fine-tuned task-specific experts into a unified model while minimizing performance degradation. Existing methods primarily approach this by minimizing differences between…

Machine Learning · Computer Science 2025-10-28 Wenju Sun , Qingyong Li , Wen Wang , Yang Liu , Yangli-ao Geng , Boyang Li