English
Related papers

Related papers: Piculet: Specialized Models-Guided Hallucination D…

200 papers

Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks. However, MLLMs still face a fundamental limitation of hallucinations, where they tend…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Chaoya Jiang , Haiyang Xu , Mengfan Dong , Jiaxing Chen , Wei Ye , Ming Yan , Qinghao Ye , Ji Zhang , Fei Huang , Shikun Zhang

Existing Large Vision-Language Models (LVLMs) primarily align image features of vision encoder with Large Language Models (LLMs) to leverage their superior text generation capabilities. However, the scale disparity between vision encoder…

Computer Vision and Pattern Recognition · Computer Science 2024-08-01 Shi Liu , Kecheng Zheng , Wei Chen

The Large Visual Language Models (LVLMs) enhances user interaction and enriches user experience by integrating visual modality on the basis of the Large Language Models (LLMs). It has demonstrated their powerful information processing and…

Artificial Intelligence · Computer Science 2024-10-22 Wei Lan , Wenyi Chen , Qingfeng Chen , Shirui Pan , Huiyu Zhou , Yi Pan

This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Zechen Bai , Pichao Wang , Tianjun Xiao , Tong He , Zongbo Han , Zheng Zhang , Mike Zheng Shou

Multimodal large language models (MLLMs) have achieved strong performance on vision-language tasks but still struggle with fine-grained visual differences, leading to hallucinations or missed semantic shifts. We attribute this to…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Tianyi Bai , Yuxuan Fan , Jiantao Qiu , Fupeng Sun , Jiayi Song , Junlin Han , Zichen Liu , Conghui He , Wentao Zhang , Binhang Yuan

Large Vision Language Models (LVLMs) have achieved significant progress in integrating visual and textual inputs for multimodal reasoning. However, a recurring challenge is ensuring these models utilize visual information as effectively as…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Estelle Aflalo , Gabriela Ben Melech Stan , Tiep Le , Man Luo , Shachar Rosenman , Sayak Paul , Shao-Yen Tseng , Vasudev Lal

Recent advancements in multimodal large language models (MLLMs) have shown unprecedented capabilities in advancing various vision-language tasks. However, MLLMs face significant challenges with hallucinations, and misleading outputs that do…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Shengqiong Wu , Hao Fei , Liangming Pan , William Yang Wang , Shuicheng Yan , Tat-Seng Chua

Multimodal Large Language Models (MLLMs) have demonstrated strong performance in visual understanding tasks, yet they often suffer from object hallucinations--generating descriptions of objects that are inconsistent with or entirely absent…

Artificial Intelligence · Computer Science 2025-05-27 Xinmiao Hu , Chun Wang , Ruihe An , ChenYu Shao , Xiaojun Ye , Sheng Zhou , Liangcheng Li

Visual hallucinations in Large Language Models (LLMs), where the model generates responses that are inconsistent with the visual input, pose a significant challenge to their reliability, particularly in contexts where precise and…

Computer Vision and Pattern Recognition · Computer Science 2025-06-30 Nokimul Hasan Arif , Shadman Rabby , Md Hefzul Hossain Papon , Sabbir Ahmed

Despite achieving rapid developments and with widespread applications, Large Vision-Language Models (LVLMs) confront a serious challenge of being prone to generating hallucinations. An over-reliance on linguistic priors has been identified…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 Lanyun Zhu , Deyi Ji , Tianrun Chen , Peng Xu , Jieping Ye , Jun Liu

Large Vision-Language Models (LVLMs) bridge the gap between visual and linguistic modalities, demonstrating strong potential across a variety of domains. However, despite significant progress, LVLMs still suffer from severe hallucination…

Computer Vision and Pattern Recognition · Computer Science 2025-12-23 Ruiqi Ma , Yu Yan , Chunhong Zhang , Minghao Yin , XinChao Liu , Zhihong Jin , Zheng Hu

Vision-Language Models (VLMs) have shown solid ability for multimodal understanding of both visual and language contexts. However, existing VLMs often face severe challenges of hallucinations, meaning that VLMs tend to generate responses…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Jinjin Cao , Zhiyang Chen , Zijun Wang , Liyuan Ma , Weijian Luo , Guojun Qi

Large Language Models (LLMs) have transformed natural language processing (NLP) tasks, but they suffer from hallucination, generating plausible yet factually incorrect content. This issue extends to Video-Language Models (VideoLLMs), where…

Computer Vision and Pattern Recognition · Computer Science 2025-04-22 Ahmad Khalil , Mahmoud Khalil , Alioune Ngom

Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating…

Computer Vision and Pattern Recognition · Computer Science 2024-06-03 Kai Wu , Boyuan Jiang , Zhengkai Jiang , Qingdong He , Donghao Luo , Shengzhi Wang , Qingwen Liu , Chengjie Wang

Large Vision-Language Models (LVLMs) integrate image encoders with Large Language Models (LLMs) to process multi-modal inputs and perform complex visual tasks. However, they often generate hallucinations by describing non-existent objects…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Yaqi Sun , Kyohei Atarashi , Koh Takeuchi , Hisashi Kashima

Multimodal hallucination in multimodal large language models (MLLMs) restricts the correctness of MLLMs. However, multimodal hallucinations are multi-sourced and arise from diverse causes. Existing benchmarks fail to adequately distinguish…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Bowen Dong , Minheng Ni , Zitong Huang , Guanglei Yang , Wangmeng Zuo , Lei Zhang

Multi-modal Large Language Models (MLLMs) demonstrate remarkable success across various vision-language tasks. However, they suffer from visual hallucination, where the generated responses diverge from the provided image. Are MLLMs…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Dingchen Yang , Bowen Cao , Guang Chen , Changjun Jiang

Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in visual understanding and multimodal reasoning. However, LVLMs frequently exhibit hallucination phenomena, manifesting as the generated textual responses that…

Computer Vision and Pattern Recognition · Computer Science 2025-07-31 Ziyun Dai , Xiaoqiang Li , Shaohua Zhang , Yuanchen Wu , Jide Li

Hallucination has been a major problem for large language models and remains a critical challenge when it comes to multimodality in which vision-language models (VLMs) have to deal with not just textual but also visual inputs. Despite rapid…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Zhecan Wang , Garrett Bingham , Adams Yu , Quoc Le , Thang Luong , Golnaz Ghiasi

Large Language Models (LLMs) have become increasingly important in natural language processing, enabling advanced data analytics through natural language queries. However, these models often generate "hallucinations"-inaccurate or…

Computation and Language · Computer Science 2024-10-29 Mikhail Rumiantsau , Aliaksei Vertsel , Ilya Hrytsuk , Isaiah Ballah
‹ Prev 1 2 3 10 Next ›