English
Related papers

Related papers: CoMMIT: Coordinated Multimodal Instruction Tuning

200 papers

Instruction-tuned large language models (LLMs) have demonstrated promising zero-shot generalization capabilities across various downstream tasks. Recent research has introduced multimodal capabilities to LLMs by integrating independently…

Computation and Language · Computer Science 2023-11-29 Utsav Garg , Erhan Bas

When adapting ICL with or without fine-tuning, we are curious about whether the instruction-tuned language model is able to achieve well-calibrated results without suffering from the problem of overconfidence (i.e., miscalibration)…

Computation and Language · Computer Science 2025-05-23 Chengzu Li , Han Zhou , Goran Glavaš , Anna Korhonen , Ivan Vulić

Multimodal Continual Instruction Tuning (MCIT) aims to finetune Multimodal Large Language Models (MLLMs) to continually align with human intent across sequential tasks. Existing approaches often rely on the Mixture-of-Experts (MoE) LoRA…

Computation and Language · Computer Science 2025-06-04 Duzhen Zhang , Yong Ren , Zhong-Zhi Li , Yahan Yu , Jiahua Dong , Chenxing Li , Zhilong Ji , Jinfeng Bai

Multilingual proficiency presents a significant challenge for large language models (LLMs). English-centric models are usually suboptimal in other languages, particularly those that are linguistically distant from English. This performance…

Computation and Language · Computer Science 2025-01-07 Geyu Lin , Bin Wang , Zhengyuan Liu , Nancy F. Chen

Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Ying Shen , Zhiyang Xu , Qifan Wang , Yu Cheng , Wenpeng Yin , Lifu Huang

With instruction tuning, Large Language Models (LLMs) can enhance their ability to adhere to commands. Diverging from most works focusing on data mixing, our study concentrates on enhancing the model's capabilities from the perspective of…

Computation and Language · Computer Science 2024-10-07 Jun Rao , Xuebo Liu , Lian Lian , Shengjun Cheng , Yunjie Liao , Min Zhang

Multimodal learning has developed very fast in recent years. However, during the multimodal training process, the model tends to rely on only one modality based on which it could learn faster, thus leading to inadequate use of other…

Machine Learning · Computer Science 2024-11-05 Zirun Guo , Tao Jin , Jingyuan Chen , Zhou Zhao

Multilingual Large Language Models (LLMs) struggle with cross-lingual tasks due to data imbalances between high-resource and low-resource languages, as well as monolingual bias in pre-training. Existing methods, such as bilingual…

Computation and Language · Computer Science 2026-04-14 Weihua Zheng , Chang Liu , Zhengyuan Liu , Xin Huang , Kui Wu , Muhammad Huzaifah Md Shahrin , Aiti Aw , Roy Ka-Wei Lee

Open-source multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks. However, their reasoning capabilities remain constrained by existing instruction-tuning datasets, which were…

Computation and Language · Computer Science 2025-06-05 Jarvis Guo , Tuney Zheng , Yuelin Bai , Bo Li , Yubo Wang , King Zhu , Yizhi Li , Graham Neubig , Wenhu Chen , Xiang Yue

Multimodal large language models (MLLMs) combine visual and textual data for tasks such as image captioning and visual question answering. Proper uncertainty calibration is crucial, yet challenging, for reliable use in areas like healthcare…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Zijun Chen , Wenbo Hu , Guande He , Zhijie Deng , Zheng Zhang , Richang Hong

This research paper addresses the challenge of modality mismatch in multimodal learning, where the modalities available during inference differ from those available at training. We propose the Text-centric Alignment for Multi-Modality…

Machine Learning · Computer Science 2024-05-22 Yun-Da Tsai , Ting-Yu Yen , Pei-Fu Guo , Zhe-Yan Li , Shou-De Lin

The impressive performance of large language models (LLMs) arises from their massive scale and heterogeneous module composition. However, this structural heterogeneity introduces additional optimization challenges. While adaptive optimizers…

Machine Learning · Computer Science 2026-05-08 Ziqing Wen , Zhouyang Liu , Jiahuan Wang , Ping Luo , Li Shen , Dongsheng Li , Tao Sun

Transformer-based language models, though not explicitly trained to mimic brain recordings, have demonstrated surprising alignment with brain activity. Progress in these models-through increased size, instruction-tuning, and…

The adaption of multilingual pre-trained LLMs into eloquent and helpful assistants is essential to facilitate their use across different language regions. In that spirit, we are the first to conduct an extensive study of the performance of…

Computation and Language · Computer Science 2024-10-11 Alexander Arno Weber , Klaudia Thellmann , Jan Ebert , Nicolas Flores-Herr , Jens Lehmann , Michael Fromm , Mehdi Ali

Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs). We study in-context learning (ICL), a…

Computation and Language · Computer Science 2024-03-29 Hanlin Zhang , Yi-Fan Zhang , Yaodong Yu , Dhruv Madeka , Dean Foster , Eric Xing , Himabindu Lakkaraju , Sham Kakade

Despite remarkable advancements in Multimodal Large Language Models (MLLMs), a fundamental question remains: are MLLMs robust to contradicting modalities? To rigorously study this, we introduce MMA-Bench comprising videos and tasks that…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Tianle Chen , Chaitanya Chakka , Arjun Reddy Akula , Xavier Thomas , Deepti Ghadiyaram

Multimodal large language models (MLLMs), such as GPT-4o, are garnering significant attention. During the exploration of MLLM training, we identified Modality Composition Incoherence, a phenomenon that the proportion of a certain modality…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-13 Yijie Zheng , Bangjun Xiao , Lei Shi , Xiaoyang Li , Faming Wu , Tianyu Li , Xuefeng Xiao , Yang Zhang , Yuxuan Wang , Shouda Liu

Recent voxel-wise multimodal brain encoding studies have shown that multimodal large language models (MLLMs) exhibit a higher degree of brain alignment compared to unimodal models. More recently, instruction-tuned multimodal (IT) models…

Although In-Context Learning (ICL) brings remarkable performance gains to Large Language Models (LLMs), the improvements remain lower than fine-tuning on downstream tasks. This paper introduces Multi-Modal In-Context Tuning (MMICT), a novel…

Artificial Intelligence · Computer Science 2024-08-13 Tao Chen , Enwei Zhang , Yuting Gao , Ke Li , Xing Sun , Yan Zhang , Hui Li , Rongrong Ji

We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to…

‹ Prev 1 2 3 10 Next ›