Related papers: CoMMIT: Coordinated Multimodal Instruction Tuning

On the Performance of Multimodal Language Models

Instruction-tuned large language models (LLMs) have demonstrated promising zero-shot generalization capabilities across various downstream tasks. Recent research has introduced multimodal capabilities to LLMs by integrating independently…

Computation and Language · Computer Science 2023-11-29 Utsav Garg , Erhan Bas

Large Language Models are Miscalibrated In-Context Learners

When adapting ICL with or without fine-tuning, we are curious about whether the instruction-tuned language model is able to achieve well-calibrated results without suffering from the problem of overconfidence (i.e., miscalibration)…

Computation and Language · Computer Science 2025-05-23 Chengzu Li , Han Zhou , Goran Glavaš , Anna Korhonen , Ivan Vulić

Enhancing Multimodal Continual Instruction Tuning with BranchLoRA

Multimodal Continual Instruction Tuning (MCIT) aims to finetune Multimodal Large Language Models (MLLMs) to continually align with human intent across sequential tasks. Existing approaches often rely on the Mixture-of-Experts (MoE) LoRA…

Computation and Language · Computer Science 2025-06-04 Duzhen Zhang , Yong Ren , Zhong-Zhi Li , Yahan Yu , Jiahua Dong , Chenxing Li , Zhilong Ji , Jinfeng Bai

CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment

Multilingual proficiency presents a significant challenge for large language models (LLMs). English-centric models are usually suboptimal in other languages, particularly those that are linguistically distant from English. This performance…

Computation and Language · Computer Science 2025-01-07 Geyu Lin , Bin Wang , Zhengyuan Liu , Nancy F. Chen

Multimodal Instruction Tuning with Conditional Mixture of LoRA

Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Ying Shen , Zhiyang Xu , Qifan Wang , Yu Cheng , Wenpeng Yin , Lifu Huang

CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions

With instruction tuning, Large Language Models (LLMs) can enhance their ability to adhere to commands. Diverging from most works focusing on data mixing, our study concentrates on enhancing the model's capabilities from the perspective of…

Computation and Language · Computer Science 2024-10-07 Jun Rao , Xuebo Liu , Lian Lian , Shengjun Cheng , Yunjie Liao , Min Zhang

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other…

Machine Learning · Computer Science 2024-11-05 Zirun Guo , Tao Jin , Jingyuan Chen , Zhou Zhao

Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance

Multilingual Large Language Models (LLMs) struggle with cross-lingual tasks due to data imbalances between high-resource and low-resource languages, as well as monolingual bias in pre-training. Existing methods, such as bilingual…

Computation and Language · Computer Science 2026-04-14 Weihua Zheng , Chang Liu , Zhengyuan Liu , Xin Huang , Kui Wu , Muhammad Huzaifah Md Shahrin , Aiti Aw , Roy Ka-Wei Lee

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Open-source multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks. However, their reasoning capabilities remain constrained by existing instruction-tuning datasets, which were…

Computation and Language · Computer Science 2025-06-05 Jarvis Guo , Tuney Zheng , Yuelin Bai , Bo Li , Yubo Wang , King Zhu , Yizhi Li , Graham Neubig , Wenhu Chen , Xiang Yue

Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

Multimodal large language models (MLLMs) combine visual and textual data for tasks such as image captioning and visual question answering. Proper uncertainty calibration is crucial, yet challenging, for reliable use in areas like healthcare…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Zijun Chen , Wenbo Hu , Guande He , Zhijie Deng , Zheng Zhang , Richang Hong

Text-centric Alignment for Multi-Modality Learning

This research paper addresses the challenge of modality mismatch in multimodal learning, where the modalities available during inference differ from those available at training. We propose the Text-centric Alignment for Multi-Modality…

Machine Learning · Computer Science 2024-05-22 Yun-Da Tsai , Ting-Yu Yen , Pei-Fu Guo , Zhe-Yan Li , Shou-De Lin

Revealing Modular Gradient Noise Imbalance in LLMs: Calibrating Adam via Signal-to-Noise Ratio

The impressive performance of large language models (LLMs) arises from their massive scale and heterogeneous module composition. However, this structural heterogeneity introduces additional optimization challenges. While adaptive optimizers…

Machine Learning · Computer Science 2026-05-08 Ziqing Wen , Zhouyang Liu , Jiahuan Wang , Ping Luo , Li Shen , Dongsheng Li , Tao Sun

Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain)

Transformer-based language models, though not explicitly trained to mimic brain recordings, have demonstrated surprising alignment with brain activity. Progress in these models-through increased size, instruction-tuning, and…

Neurons and Cognition · Quantitative Biology 2025-05-27 Subba Reddy Oota , Akshett Jindal , Ishani Mondal , Khushbu Pahwa , Satya Sai Srinath Namburi , Manish Shrivastava , Maneesh Singh , Bapi S. Raju , Manish Gupta

Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?

The adaption of multilingual pre-trained LLMs into eloquent and helpful assistants is essential to facilitate their use across different language regions. In that spirit, we are the first to conduct an extensive study of the performance of…

Computation and Language · Computer Science 2024-10-11 Alexander Arno Weber , Klaudia Thellmann , Jan Ebert , Nicolas Flores-Herr , Jens Lehmann , Michael Fromm , Mehdi Ali

A Study on the Calibration of In-context Learning

Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs). We study in-context learning (ICL), a…

Computation and Language · Computer Science 2024-03-29 Hanlin Zhang , Yi-Fan Zhang , Yaodong Yu , Dhruv Madeka , Dean Foster , Eric Xing , Himabindu Lakkaraju , Sham Kakade

Some Modalities are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs

Despite remarkable advancements in Multimodal Large Language Models (MLLMs), a fundamental question remains: are MLLMs robust to contradicting modalities? To rigorously study this, we introduce MMA-Bench comprising videos and tasks that…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Tianle Chen , Chaitanya Chakka , Arjun Reddy Akula , Xavier Thomas , Deepti Ghadiyaram

OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training

Multimodal large language models (MLLMs), such as GPT-4o, are garnering significant attention. During the exploration of MLLM training, we identified Modality Composition Incoherence, a phenomenon that the proportion of a certain modality…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-13 Yijie Zheng , Bangjun Xiao , Lei Shi , Xiaoyang Li , Faming Wu , Tianyu Li , Xuefeng Xiao , Yang Zhang , Yuxuan Wang , Shouda Liu

Task-conditioned probing of instruction-tuned multimodal LLMs: Region-specific brain alignment patterns under naturalistic stimuli

Recent voxel-wise multimodal brain encoding studies have shown that multimodal large language models (MLLMs) exhibit a higher degree of brain alignment compared to unimodal models. More recently, instruction-tuned multimodal (IT) models…

Neurons and Cognition · Quantitative Biology 2026-05-21 Subba Reddy Oota , Khushbu Pahwa , Prachi Jindal , Satya Sai Srinath Namburi , Maneesh Singh , Tanmoy Chakraborty , Bapi S. Raju , Manish Gupta

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples

Although In-Context Learning (ICL) brings remarkable performance gains to Large Language Models (LLMs), the improvements remain lower than fine-tuning on downstream tasks. This paper introduces Multi-Modal In-Context Tuning (MMICT), a novel…

Artificial Intelligence · Computer Science 2024-08-13 Tao Chen , Enwei Zhang , Yuting Gao , Ke Li , Xing Sun , Yan Zhang , Hui Li , Rongrong Ji

ImageBind-LLM: Multi-modality Instruction Tuning

We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to…

Multimedia · Computer Science 2023-09-13 Jiaming Han , Renrui Zhang , Wenqi Shao , Peng Gao , Peng Xu , Han Xiao , Kaipeng Zhang , Chris Liu , Song Wen , Ziyu Guo , Xudong Lu , Shuai Ren , Yafei Wen , Xiaoxin Chen , Xiangyu Yue , Hongsheng Li , Yu Qiao