Related papers: Improving Multimodal Large Language Models Using C…

When Continue Learning Meets Multimodal Large Language Model: A Survey

Recent advancements in Artificial Intelligence have led to the development of Multimodal Large Language Models (MLLMs). However, adapting these pre-trained models to dynamic data distributions and various tasks efficiently remains a…

Machine Learning · Computer Science 2025-03-05 Yukang Huo , Hao Tang

Continual Learning in Large Language Models: Methods, Challenges, and Opportunities

Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static…

Computation and Language · Computer Science 2026-03-16 Hongyang Chen , Zhongwu Sun , Hongfei Ye , Kunchi Li , Xuemin Lin

Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge

In recent years, multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets, enabling them to generally understand images well. However, the inherent difficulty in explicitly…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Yuanze Lin , Yunsheng Li , Dongdong Chen , Weijian Xu , Ronald Clark , Philip Torr , Lu Yuan

The Revolution of Multimodal Large Language Models: A Survey

Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of large language models, significant research efforts are being devoted to the development of Multimodal…

Computer Vision and Pattern Recognition · Computer Science 2024-06-07 Davide Caffagni , Federico Cocchi , Luca Barsellotti , Nicholas Moratelli , Sara Sarto , Lorenzo Baraldi , Lorenzo Baraldi , Marcella Cornia , Rita Cucchiara

MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

In recent years, multimodal large language models (MLLMs) have shown remarkable capabilities in tasks like visual question answering and common sense reasoning, while visual perception models have made significant strides in perception…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Guanqun Wang , Xinyu Wei , Jiaming Liu , Ray Zhang , Yichi Zhang , Kevin Zhang , Maurice Chong , Shanghang Zhang

Evaluating Multimodal Large Language Models on Educational Textbook Question Answering

Multimodal large language models (MLLMs) have shown success in vision-language tasks, but their ability to reason over complex educational materials remains largely untested. This work presents the first evaluation of state-of-the-art…

Computation and Language · Computer Science 2025-07-16 Hessa A. Alawwad , Anas Zafar , Areej Alhothali , Usman Naseem , Ali Alkhathlan , Amani Jamal

Towards General Continuous Memory for Vision-Language Models

Language models (LMs) and their extension, vision-language models (VLMs), have achieved remarkable performance across various tasks. However, they still struggle with complex reasoning tasks that require multimodal or multilingual…

Machine Learning · Computer Science 2025-07-09 Wenyi Wu , Zixuan Song , Kun Zhou , Yifei Shao , Zhiting Hu , Biwei Huang

Continual Instruction Tuning for Large Multimodal Models

Instruction tuning is now a widely adopted approach to aligning large multimodal models (LMMs) to follow human intent. It unifies the data format of vision-language tasks, enabling multi-task joint training. However, vision-language tasks…

Machine Learning · Computer Science 2023-11-29 Jinghan He , Haiyun Guo , Ming Tang , Jinqiao Wang

MammothModa: Multi-Modal Large Language Model

In this report, we introduce MammothModa, yet another multi-modal large language model (MLLM) designed to achieve state-of-the-art performance starting from an elementary baseline. We focus on three key design insights: (i) Integrating…

Computer Vision and Pattern Recognition · Computer Science 2024-06-27 Qi She , Junwen Pan , Xin Wan , Rui Zhang , Dawei Lu , Kai Huang

Continual Learning for Generative AI: From LLMs to MLLMs and Beyond

The rapid advancement of generative models has empowered modern AI systems to comprehend and produce highly sophisticated content, even achieving human-level performance in specific domains. However, these models are fundamentally…

Machine Learning · Computer Science 2025-08-26 Haiyang Guo , Fanhu Zeng , Fei Zhu , Jiayi Wang , Xukai Wang , Jingang Zhou , Hongbo Zhao , Wenzhuo Liu , Shijie Ma , Da-Han Wang , Xu-Yao Zhang , Cheng-Lin Liu

Perceiving Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models

Achieving deep alignment between vision and language remains a central challenge for Multimodal Large Language Models (MLLMs). These models often fail to fully leverage visual input, defaulting to strong language priors. Our approach first…

Computer Vision and Pattern Recognition · Computer Science 2025-07-03 Aarti Ghatkesar , Ganesh Venkatesh

Bridging the Language Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs

Large language models (LLMs) have revolutionized various domains but still struggle with non-Latin scripts and low-resource languages. This paper addresses the critical challenge of improving multilingual performance without extensive…

Computation and Language · Computer Science 2025-01-08 Somnath Kumar , Vaibhav Balloli , Mercy Ranjit , Kabir Ahuja , Sunayana Sitaram , Kalika Bali , Tanuja Ganu , Akshay Nambi

Improving Visual Storytelling with Multimodal Large Language Models

Visual storytelling is an emerging field that combines images and narratives to create engaging and contextually rich stories. Despite its potential, generating coherent and emotionally resonant visual stories remains challenging due to the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-04 Xiaochuan Lin , Xiangyong Chen

Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts

Continual learning is essential for medical image classification systems to adapt to dynamically evolving clinical environments. The integration of multimodal information can significantly enhance continual learning of image classes.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Jiantao Tan , Peixian Ma , Kanghao Chen , Zhiming Dai , Ruixuan Wang

X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment

The impressive development of large language models (LLMs) is expanding into the realm of large multimodal models (LMMs), which incorporate multiple types of data beyond text. However, the nature of multimodal models leads to significant…

Computation and Language · Computer Science 2024-08-05 Dongjae Shin , Hyeonseok Lim , Inho Won , Changsu Choi , Minjun Kim , Seungwoo Song , Hangyeol Yoo , Sangmin Kim , Kyungtae Lim

An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference

The development of state-of-the-art generative large language models (LLMs) disproportionately relies on English-centric tokenizers, vocabulary and pre-training data. Despite the fact that some LLMs have multilingual capabilities, recent…

Computation and Language · Computer Science 2024-09-27 Atsuki Yamaguchi , Aline Villavicencio , Nikolaos Aletras

1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling identical queries in…

Computation and Language · Computer Science 2024-06-24 Yue Huang , Chenrui Fan , Yuan Li , Siyuan Wu , Tianyi Zhou , Xiangliang Zhang , Lichao Sun

Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning

Multimodal models typically combine a powerful large language model (LLM) with a vision encoder and are then trained on multimodal data via instruction tuning. While this process adapts LLMs to multimodal settings, it remains unclear…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 Neale Ratzlaff , Man Luo , Xin Su , Vasudev Lal , Phillip Howard

Large Language Models Facilitate Vision Reflection in Image Classification

This paper presents several novel findings on the explainability of vision reflection in large multimodal models (LMMs). First, we show that prompting an LMM to verify the prediction of a specialized vision model can improve recognition…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Guoyuan An , JaeYoon Kim , SungEui Yoon

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models far exceed the complexity of conventional neural…

Machine Learning · Computer Science 2024-12-05 Minghao Shao , Abdul Basit , Ramesh Karri , Muhammad Shafique