Related papers: A Concept-Based Explainability Framework for Large…

Explaining Multi-modal Large Language Models by Analyzing their Vision Perception

Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in understanding and generating content across various modalities, such as images and text. However, their interpretability remains a challenge, hindering…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Loris Giulivi , Giacomo Boracchi

Multi-modal Auto-regressive Modeling via Visual Words

Large Language Models (LLMs), benefiting from the auto-regressive modelling approach performed on massive unannotated texts corpora, demonstrates powerful perceptual and reasoning capabilities. However, as for extending auto-regressive…

Computer Vision and Pattern Recognition · Computer Science 2024-09-24 Tianshuo Peng , Zuchao Li , Lefei Zhang , Hai Zhao , Ping Wang , Bo Du

Concept-Oriented Deep Learning with Large Language Models

Large Language Models (LLMs) have been successfully used in many natural-language tasks and applications including text generation and AI chatbots. They also are a promising new technology for concept-oriented deep learning (CODL). However,…

Machine Learning · Computer Science 2023-09-21 Daniel T. Chang

Towards Concept-Aware Large Language Models

Concepts play a pivotal role in various human cognitive functions, including learning, reasoning and communication. However, there is very little work on endowing machines with the ability to form and reason with concepts. In particular,…

Computation and Language · Computer Science 2023-11-06 Chen Shani , Jilles Vreeken , Dafna Shahaf

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with large language models (LLMs) and computer vision (CV) systems driving advancements in natural language understanding and visual processing,…

Computation and Language · Computer Science 2024-12-04 Yunkai Dang , Kaichen Huang , Jiahao Huo , Yibo Yan , Sirui Huang , Dongrui Liu , Mengxi Gao , Jie Zhang , Chen Qian , Kun Wang , Yong Liu , Jing Shao , Hui Xiong , Xuming Hu

A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models

The rise of foundation models has transformed machine learning research, prompting efforts to uncover their inner workings and develop more efficient and reliable applications for better control. While significant progress has been made in…

Machine Learning · Computer Science 2025-02-26 Zihao Lin , Samyadeep Basu , Mohammad Beigi , Varun Manjunatha , Ryan A. Rossi , Zichao Wang , Yufan Zhou , Sriram Balasubramanian , Arman Zarei , Keivan Rezaei , Ying Shen , Barry Menglong Yao , Zhiyang Xu , Qin Liu , Yuxiang Zhang , Yan Sun , Shilong Liu , Li Shen , Hongxuan Li , Soheil Feizi , Lifu Huang

LLMs Explain't: A Post-Mortem on Semantic Interpretability in Transformer Models

Large Language Models (LLMs) are becoming increasingly popular in pervasive computing due to their versatility and strong performance. However, despite their ubiquitous use, the exact mechanisms underlying their outstanding performance…

Computation and Language · Computer Science 2026-02-02 Alhassan Abdelhalim , Janick Edinger , Sören Laue , Michaela Regneri

A Survey on Benchmarks of Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) are gaining increasing popularity in both academia and industry due to their remarkable performance in various applications such as visual question answering, visual perception, understanding, and…

Computation and Language · Computer Science 2024-09-09 Jian Li , Weiheng Lu , Hao Fei , Meng Luo , Ming Dai , Min Xia , Yizhang Jin , Zhenye Gan , Ding Qi , Chaoyou Fu , Ying Tai , Wankou Yang , Yabiao Wang , Chengjie Wang

The Revolution of Multimodal Large Language Models: A Survey

Connecting text and visual modalities plays an essential role in generative intelligence. For this reason, inspired by the success of large language models, significant research efforts are being devoted to the development of Multimodal…

Computer Vision and Pattern Recognition · Computer Science 2024-06-07 Davide Caffagni , Federico Cocchi , Luca Barsellotti , Nicholas Moratelli , Sara Sarto , Lorenzo Baraldi , Lorenzo Baraldi , Marcella Cornia , Rita Cucchiara

MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

In recent years, multimodal large language models (MLLMs) have shown remarkable capabilities in tasks like visual question answering and common sense reasoning, while visual perception models have made significant strides in perception…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Guanqun Wang , Xinyu Wei , Jiaming Liu , Ray Zhang , Yichi Zhang , Kevin Zhang , Maurice Chong , Shanghang Zhang

A Principled Framework for Knowledge-enhanced Large Language Model

Large Language Models (LLMs) are versatile, yet they often falter in tasks requiring deep and reliable reasoning due to issues like hallucinations, limiting their applicability in critical scenarios. This paper introduces a rigorously…

Computation and Language · Computer Science 2023-11-21 Saizhuo Wang , Zhihan Liu , Zhaoran Wang , Jian Guo

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily adept at processing textual information. To address this…

Computer Vision and Pattern Recognition · Computer Science 2025-10-15 Akash Ghosh , Arkadeep Acharya , Sriparna Saha , Vinija Jain , Aman Chadha

Demystifying Embedding Spaces using Large Language Models

Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream…

Computation and Language · Computer Science 2024-03-14 Guy Tennenholtz , Yinlam Chow , Chih-Wei Hsu , Jihwan Jeong , Lior Shani , Azamat Tulepbergenov , Deepak Ramachandran , Martin Mladenov , Craig Boutilier

UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models

Multimodal Entity Linking (MEL) is a crucial task that aims at linking ambiguous mentions within multimodal contexts to the referent entities in a multimodal knowledge base, such as Wikipedia. Existing methods focus heavily on using complex…

Artificial Intelligence · Computer Science 2024-08-22 Liu Qi , He Yongyi , Lian Defu , Zheng Zhi , Xu Tong , Liu Che , Chen Enhong

Large Language Models: A Mathematical Formulation

Large language models (LLMs) process and predict sequences containing text to answer questions, and address tasks including document summarization, providing recommendations, writing software and solving quantitative problems. We provide a…

Numerical Analysis · Mathematics 2026-02-02 Ricardo Baptista , Andrew Stuart , Son Tran

Large Language Models Facilitate Vision Reflection in Image Classification

This paper presents several novel findings on the explainability of vision reflection in large multimodal models (LMMs). First, we show that prompting an LMM to verify the prediction of a specialized vision model can improve recognition…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Guoyuan An , JaeYoon Kim , SungEui Yoon

Probing Multimodal Large Language Models for Global and Local Semantic Representations

The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images. Recent works leverage image-caption datasets to train MLLMs, achieving…

Computation and Language · Computer Science 2024-11-22 Mingxu Tao , Quzhe Huang , Kun Xu , Liwei Chen , Yansong Feng , Dongyan Zhao

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models far exceed the complexity of conventional neural…

Machine Learning · Computer Science 2024-12-05 Minghao Shao , Abdul Basit , Ramesh Karri , Muhammad Shafique

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Recent advances in Large Multimodal Models (LMMs) lead to significant breakthroughs in both academia and industry. One question that arises is how we, as humans, can understand their internal neural representations. This paper takes an…

Computer Vision and Pattern Recognition · Computer Science 2025-09-19 Kaichen Zhang , Yifei Shen , Bo Li , Ziwei Liu

Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers?

Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the…

Computation and Language · Computer Science 2025-02-06 Mingyu Jin , Qinkai Yu , Jingyuan Huang , Qingcheng Zeng , Zhenting Wang , Wenyue Hua , Haiyan Zhao , Kai Mei , Yanda Meng , Kaize Ding , Fan Yang , Mengnan Du , Yongfeng Zhang