Related papers: Benchmarking Visual Language Models on Standardize…

Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness

Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art…

Computation and Language · Computer Science 2024-10-07 Srija Mukhopadhyay , Adnan Qidwai , Aparna Garimella , Pritika Ramu , Vivek Gupta , Dan Roth

IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models

The advent of Vision Language Models (VLM) has allowed researchers to investigate the visual understanding of a neural network using natural language. Beyond object classification and detection, VLMs are capable of visual comprehension and…

Computer Vision and Pattern Recognition · Computer Science 2024-08-12 Haz Sameen Shahgir , Khondker Salman Sayeed , Abhik Bhattacharjee , Wasi Uddin Ahmad , Yue Dong , Rifat Shahriyar

VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages

Vision Language Models (VLMs) are pivotal for advancing perception in intelligent agents. Yet, evaluation of VLMs remains limited to predominantly English-centric benchmarks in which the image-text pairs comprise short texts. To evaluate…

Computation and Language · Computer Science 2025-10-16 Jesse Atuhurra , Iqra Ali , Tomoya Iwakura , Hidetaka Kamigaito , Tatsuya Hiraoka

Losing the Plot: How VLM responses degrade on imperfect charts

Vision language models (VLMs) show strong results on chart understanding, yet existing benchmarks assume clean figures and fact based queries. Real world charts often contain distortions and demand reasoning beyond simple matching. We…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Philip Wootaek Shin , Jack Sampson , Vijaykrishnan Narayanan , Andres Marquez , Mahantesh Halappanavar

Benchmarking and Improving Large Vision-Language Models for Fundamental Visual Graph Understanding and Reasoning

Large Vision-Language Models (LVLMs) have demonstrated remarkable performance across diverse tasks. Despite great success, recent studies show that LVLMs encounter substantial limitations when engaging with visual graphs. To study the…

Computation and Language · Computer Science 2025-06-09 Yingjie Zhu , Xuefeng Bai , Kehai Chen , Yang Xiang , Jun Yu , Min Zhang

VHELM: A Holistic Evaluation of Vision Language Models

Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity. Furthermore, they differ in…

Computer Vision and Pattern Recognition · Computer Science 2024-10-25 Tony Lee , Haoqin Tu , Chi Heem Wong , Wenhao Zheng , Yiyang Zhou , Yifan Mai , Josselin Somerville Roberts , Michihiro Yasunaga , Huaxiu Yao , Cihang Xie , Percy Liang

Uncertainty-Aware Evaluation for Vision-Language Models

Vision-Language Models like GPT-4, LLaVA, and CogVLM have surged in popularity recently due to their impressive performance in several vision-language tasks. Current evaluation methods, however, overlook an essential component: uncertainty,…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Vasily Kostumov , Bulat Nutfullin , Oleg Pilipenko , Eugene Ilyushin

Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models

Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still…

Computation and Language · Computer Science 2024-06-25 Jesse Atuhurra , Iqra Ali , Tatsuya Hiraoka , Hidetaka Kamigaito , Tomoya Iwakura , Taro Watanabe

CIVET: Systematic Evaluation of Understanding in VLMs

While Vision-Language Models (VLMs) have achieved competitive performance in various tasks, their comprehension of the underlying structure and semantics of a scene remains understudied. To investigate the understanding of VLMs, we study…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Massimo Rizzoli , Simone Alghisi , Olha Khomyn , Gabriel Roccabruna , Seyed Mahed Mousavi , Giuseppe Riccardi

Do Vision-Language Models Really Understand Visual Language?

Visual language is a system of communication that conveys information through symbols, shapes, and spatial arrangements. Diagrams are a typical example of a visual language depicting complex concepts and their relationships in the form of…

Computation and Language · Computer Science 2025-05-27 Yifan Hou , Buse Giledereli , Yilei Tu , Mrinmaya Sachan

Visual Language Models show widespread visual deficits on neuropsychological tests

Visual Language Models (VLMs) show remarkable performance in visual reasoning tasks, successfully tackling college-level challenges that require high-level understanding of images. However, some recent reports of VLMs struggling to reason…

Computer Vision and Pattern Recognition · Computer Science 2025-04-17 Gene Tangtartharakul , Katherine R. Storrs

Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models

Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial. This paper aims to evaluate the ability of…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Tobias Groot , Matias Valdenegro-Toro

Do Large Language Models Understand Data Visualization Principles?

Data visualization principles, derived from decades of research in design and perception, ensure proper visual communication. While prior work has shown that large language models (LLMs) can generate charts or flag misleading figures, it…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Martin Sinnona , Valentin Bonas , Viviana Siless , Emmanuel Iarussi

VisualQuest: A Benchmark for Abstract Visual Reasoning in MLLMs

We introduce VisualQuest, a novel dataset designed to rigorously evaluate multimodal large language models (MLLMs) on abstract visual reasoning tasks that require the integration of symbolic, cultural, and linguistic knowledge. Unlike…

Computer Vision and Pattern Recognition · Computer Science 2026-01-05 Kelaiti Xiao , Liang Yang , Dongyu Zhang , Paerhati Tulajiang , Hongfei Lin

The Perils of Chart Deception: How Misleading Visualizations Affect Vision-Language Models

Information visualizations are powerful tools that help users quickly identify patterns, trends, and outliers, facilitating informed decision-making. However, when visualizations incorporate deceptive design elements-such as truncated or…

Computation and Language · Computer Science 2025-08-14 Ridwan Mahbub , Mohammed Saidul Islam , Md Tahmid Rahman Laskar , Mizanur Rahman , Mir Tafseer Nayeem , Enamul Hoque

Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity

This paper presents novel benchmarks for evaluating vision-language models (VLMs) in zero-shot recognition, focusing on granularity and specificity. Although VLMs excel in tasks like image captioning, they face challenges in open-world…

Computer Vision and Pattern Recognition · Computer Science 2024-06-19 Zhenlin Xu , Yi Zhu , Tiffany Deng , Abhay Mittal , Yanbei Chen , Manchen Wang , Paolo Favaro , Joseph Tighe , Davide Modolo

Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?

Charts are ubiquitous as they help people understand and reason with data. Recently, various downstream tasks, such as chart question answering, chart2text, and fact-checking, have emerged. Large Vision-Language Models (LVLMs) show promise…

Computation and Language · Computer Science 2025-07-08 Md Tahmid Rahman Laskar , Mohammed Saidul Islam , Ridwan Mahbub , Ahmed Masry , Mizanur Rahman , Amran Bhuiyan , Mir Tafseer Nayeem , Shafiq Joty , Enamul Hoque , Jimmy Huang

VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

Vision-Language Models (VLMs) have achieved impressive performance in cross-modal understanding across textual and visual inputs, yet existing benchmarks predominantly focus on pure-text queries. In real-world scenarios, language also…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Qing'an Liu , Juntong Feng , Yuhao Wang , Xinzhe Han , Yujie Cheng , Yue Zhu , Haiwen Diao , Yunzhi Zhuge , Huchuan Lu

LLMs Are Not Yet Ready for Deepfake Image Detection

The growing sophistication of deepfakes presents substantial challenges to the integrity of media and the preservation of public trust. Concurrently, vision-language models (VLMs), large language models enhanced with visual reasoning…

Computer Vision and Pattern Recognition · Computer Science 2025-06-13 Shahroz Tariq , David Nguyen , M. A. P. Chamikara , Tingmin Wu , Alsharif Abuadbba , Kristen Moore

When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents

Vision-language models (VLMs) perform well on many document understanding tasks, yet their reliability in specialized, non-English domains remains underexplored. This gap is especially critical in finance, where documents mix dense…

Computation and Language · Computer Science 2026-03-17 Virginie Mouilleron , Théo Lasnier , Anna Mosolova , Djamé Seddah