English
Related papers

Related papers: VACoDe: Visual Augmented Contrastive Decoding

200 papers

While large vision-language models (LVLMs) have shown impressive capabilities in generating plausible responses correlated with input visual contents, they still suffer from hallucinations, where the generated text inaccurately reflects…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Yi-Lun Lee , Yi-Hsuan Tsai , Wei-Chen Chiu

Large Vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned. Despite their success, LVLMs still…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Sicong Leng , Hang Zhang , Guanzheng Chen , Xin Li , Shijian Lu , Chunyan Miao , Lidong Bing

Large Vision-Language Models (LVLMs) have demonstrated remarkable multimodal capabilities, but they inherit the tendency to hallucinate from their underlying language models. While visual contrastive decoding has been proposed to mitigate…

Computer Vision and Pattern Recognition · Computer Science 2026-03-04 Eun Woo Im , Muhammad Kashif Ali , Vivek Gupta

Large vision-language models (LVLMs) have shown remarkable performance in visual-language understanding for downstream multimodal tasks. While their capabilities are improving, problems emerge simultaneously. Among those problems, the…

Computer Vision and Pattern Recognition · Computer Science 2025-10-06 Jingyuan Deng , Yujiu Yang

Contrastive decoding strategies are widely used to mitigate object hallucinations in multimodal large language models (MLLMs). By reducing over-reliance on language priors, these strategies ensure that generated content remains closely…

Computer Vision and Pattern Recognition · Computer Science 2025-05-28 Hao Yin , Guangzong Si , Zilei Wang

Large Vision-Language Models (LVLMs) are an extension of Large Language Models (LLMs) that facilitate processing both image and text inputs, expanding AI capabilities. However, LVLMs struggle with object hallucinations due to their reliance…

Computation and Language · Computer Science 2024-08-12 Avshalom Manevich , Reut Tsarfaty

Recent studies have shown that Large Vision-Language Models (VLMs) tend to neglect image content and over-rely on language-model priors, resulting in errors in visually grounded tasks and hallucinations. We hypothesize that this issue…

Computer Vision and Pattern Recognition · Computer Science 2025-06-04 Shengguang Wu , Fan-Yun Sun , Kaiyue Wen , Nick Haber

Hallucination remains a major challenge in multimodal large language models (MLLMs). To address this, various contrastive decoding (CD) methods have been proposed that contrasts original logits with hallucinated logits generated from…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Chaeyoung Jung , Youngjoon Jang , Joon Son Chung

Over-reliance on language priors is a major cause of hallucinations in Large Vision-Language Models (LVLMs), often leading to outputs that are linguistically plausible but visually inconsistent. Recent studies have explored contrastive…

Computer Vision and Pattern Recognition · Computer Science 2025-09-17 Jianfei Zhao , Feng Zhang , Xin Sun , Lingxing Kong , Zhixing Tan , Chong Feng

Large vision-language models (LVLMs) are now central to healthcare applications such as medical visual question answering and imaging report generation. Yet, these models remain vulnerable to hallucination outputs that appear plausible but…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Zahra Mahdavi , Zahra Khodakaramimaghsoud , Hooman Khaloo , Sina Bakhshandeh Taleshani , Erfan Hashemi , Javad Mirzapour Kaleybar , Omid Nejati Manzari

Despite significant advancements in Vision-Language Models (VLMs), the performance of existing VLMs remains hindered by object hallucination, a critical challenge to achieving accurate visual understanding. To address this issue, we propose…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Woohyeon Park , Woojin Kim , Jaeik Kim , Jaeyoung Do

We study object hallucination in Multimodal Large Language Models (MLLMs) and improve visual contrastive decoding (VCD) by constructing an object-aligned auxiliary view. We leverage object-centric attention in self-supervised Vision…

Computer Vision and Pattern Recognition · Computer Science 2026-02-13 Boqi Chen , Xudong Liu , Jianing Qiu

Large Visual Language Models (LVLMs) integrate visual and linguistic modalities, exhibiting exceptional performance across various multimodal tasks. Nevertheless, LVLMs remain vulnerable to the issue of object hallucinations. Previous…

Computer Vision and Pattern Recognition · Computer Science 2025-02-04 Chao Wang , Xuancheng Zhou , Weiwei Fu , Yang Zhou

Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding…

Computer Vision and Pattern Recognition · Computer Science 2025-04-02 Xinyu Lyu , Beitao Chen , Lianli Gao , Jingkuan Song , Heng Tao Shen

While visual data augmentation remains a cornerstone for training robust vision models, it has received limited attention in visual language models (VLMs), which predominantly rely on large-scale real data acquisition or synthetic…

Computer Vision and Pattern Recognition · Computer Science 2025-12-03 Zhengzhuo Xu , Chong Sun , SiNan Du , Chen Li , Jing Lyu , Chun Yuan

Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual context understanding and coherent response generation. However, alongside these advancements, the issue of hallucinations has emerged as a…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Junho Kim , Hyunjun Kim , Yeonju Kim , Yong Man Ro

Although multimodal large language models (MLLMs) exhibit remarkable reasoning capabilities on complex multimodal understanding tasks, they still suffer from the notorious hallucination issue: generating outputs misaligned with obvious…

Machine Learning · Computer Science 2025-11-04 Wei Chen , Xin Yan , Bin Wen , Fan Yang , Tingting Gao , Di Zhang , Long Chen

Large Vision-Language Models (LVLMs) are increasingly adept at generating contextually detailed and coherent responses from visual inputs. However, their application in multimodal decision-making and open-ended generation is hindered by a…

Computer Vision and Pattern Recognition · Computer Science 2024-06-06 Xintong Wang , Jingheng Pan , Liang Ding , Chris Biemann

Large language models (LLMs) have demonstrated exceptional proficiency in language understanding. However, when LLMs align their outputs with deceptive and/or misleading prompts, the generated responses could deviate from the de facto…

Computation and Language · Computer Science 2025-09-03 Zixuan Shangguan , Yanjie Dong , Lanjun Wang , Xiaoyi Fan , Victor C. M. Leung , Xiping Hu

Large vision-language models (LVLMs) are increasingly being applied to multi-view image inputs captured from diverse viewpoints. However, despite this growing use, current LVLMs often confuse or mismatch visual information originating from…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Wooje Park , Insu Lee , Soohyun Kim , Jaeyun Jang , Minyoung Noh , Kyuhong Shim , Byonghyo Shim
‹ Prev 1 2 3 10 Next ›