Related papers: Benchmarking Visual Language Models on Standardize…

Do LLMs Have Visualization Literacy? An Evaluation on Modified Visualizations to Test Generalization in Data Interpretation

In this paper, we assess the visualization literacy of two prominent Large Language Models (LLMs): OpenAI's Generative Pretrained Transformers (GPT), the backend of ChatGPT, and Google's Gemini, previously known as Bard, to establish…

Performance · Computer Science 2025-01-28 Jiayi Hong , Christian Seto , Arlen Fan , Ross Maciejewski

Do Vision-Language Models See Visualizations Like Humans? Alignment in Chart Categorization

Vision-language models (VLMs) hold promise for enhancing visualization tools, but effective human-AI collaboration hinges on a shared perceptual understanding of visual content. Prior studies assessed VLM visualization literacy through…

Human-Computer Interaction · Computer Science 2025-11-10 Péter Ferenc Gyarmati , Manfred Klaffenböck , Laura Koesten , Torsten Möller

Visualization Literacy of Multimodal Large Language Models: A Comparative Study

The recent introduction of multimodal large language models (MLLMs) combine the inherent power of large language models (LLMs) with the renewed capabilities to reason about the multimodal context. The potential usage scenarios for MLLMs…

Computation and Language · Computer Science 2024-07-17 Zhimin Li , Haichao Miao , Valerio Pascucci , Shusen Liu

Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs

Natural language is a powerful complementary modality of communication for data visualizations, such as bar and line charts. To facilitate chart-based reasoning using natural language, various downstream tasks have been introduced recently…

Computation and Language · Computer Science 2024-10-07 Mohammed Saidul Islam , Raian Rahman , Ahmed Masry , Md Tahmid Rahman Laskar , Mir Tafseer Nayeem , Enamul Hoque

Probing the Visualization Literacy of Vision Language Models: the Good, the Bad, and the Ugly

Vision Language Models (VLMs) demonstrate promising chart comprehension capabilities. Yet, prior explorations of their visualization literacy have been limited to assessing their response correctness and fail to explore their internal…

Human-Computer Interaction · Computer Science 2025-04-09 Lianghan Dong , Anamaria Crisan

A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges

Multimodal Vision Language Models (VLMs) have emerged as a transformative topic at the intersection of computer vision and natural language processing, enabling machines to perceive and reason about the world through both visual and textual…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Zongxia Li , Xiyang Wu , Hongyang Du , Fuxiao Liu , Huy Nghiem , Guangyao Shi

Vision language models are blind: Failing to translate detailed visual features into words

While large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro, score high on many vision-understanding benchmarks, they are still struggling with low-level vision tasks that are easy to humans. Specifically,…

Artificial Intelligence · Computer Science 2025-03-28 Pooyan Rahmanzadehgervi , Logan Bolton , Mohammad Reza Taesiri , Anh Totti Nguyen

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

Chart understanding presents a unique challenge for large vision-language models (LVLMs), as it requires the integration of sophisticated textual and visual reasoning capabilities. However, current LVLMs exhibit a notable imbalance between…

Computation and Language · Computer Science 2026-02-12 Liyan Tang , Grace Kim , Xinyu Zhao , Thom Lake , Wenxuan Ding , Fangcong Yin , Prasann Singhal , Manya Wadhwa , Zeyu Leo Liu , Zayne Sprague , Ramya Namuduri , Bodun Hu , Juan Diego Rodriguez , Puyuan Peng , Greg Durrett

Adaptive Assessment of Visualization Literacy

Visualization literacy is an essential skill for accurately interpreting data to inform critical decisions. Consequently, it is vital to understand the evolution of this ability and devise targeted interventions to enhance it, requiring…

Human-Computer Interaction · Computer Science 2023-08-29 Yuan Cui , Lily W. Ge , Yiren Ding , Fumeng Yang , Lane Harrison , Matthew Kay

Charts-of-Thought: Enhancing LLM Visualization Literacy Through Structured Data Extraction

This paper evaluates the visualization literacy of modern Large Language Models (LLMs) and introduces a novel prompting technique called Charts-of-Thought. We tested three state-of-the-art LLMs (Claude-3.7-sonnet, GPT-4.5 preview, and…

Human-Computer Interaction · Computer Science 2025-12-11 Amit Kumar Das , Mohammad Tarun , Klaus Mueller

VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs

Vision-Language Models (VLMs) excel at complex visual tasks such as VQA and chart understanding, yet recent work suggests they struggle with simple perceptual tests. We present an evaluation of vision-language models' capacity for nonlocal…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Shmuel Berman , Jia Deng

The Plot Thickens: Quantitative Part-by-Part Exploration of MLLM Visualization Literacy

Multimodal Large Language Models (MLLMs) can interpret data visualizations, but what makes a visualization understandable to these models? Do factors like color, shape, and text influence legibility, and how does this compare to human…

Human-Computer Interaction · Computer Science 2025-04-04 Matheus Valentim , Vaishali Dhanoa , Gabriela Molina León , Niklas Elmqvist

Benchmarking Vision-Language Models on Optical Character Recognition in Dynamic Video Environments

This paper introduces an open-source benchmark for evaluating Vision-Language Models (VLMs) on Optical Character Recognition (OCR) tasks in dynamic video environments. We present a curated dataset containing 1,477 manually annotated frames…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Sankalp Nagaonkar , Augustya Sharma , Ashish Choithani , Ashutosh Trivedi

Can Vision-Language Models See Squares? Text-Recognition Mediates Spatial Reasoning Across Three Model Families

We present a simple experiment that exposes a fundamental limitation in vision-language models (VLMs): the inability to accurately localize filled cells in binary grids when those cells lack textual identity. We generate fifteen 15x15 grids…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Yuval Levental

Mini-VLAT: A Short and Effective Measure of Visualization Literacy

The visualization community regards visualization literacy as a necessary skill. Yet, despite the recent increase in research into visualization literacy by the education and visualization communities, we lack practical and time-effective…

Human-Computer Interaction · Computer Science 2023-08-09 Saugat Pandey , Alvitta Ottley

Benchmarking Vision Language Models for Cultural Understanding

Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of visual and linguistic data. However, their performance has been typically assessed on general scene…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Shravan Nayak , Kanishk Jain , Rabiul Awal , Siva Reddy , Sjoerd van Steenkiste , Lisa Anne Hendricks , Karolina Stańczak , Aishwarya Agrawal

Do MLLMs See What We See? Analyzing Visualization Literacy Barriers in AI Systems

Multimodal Large Language Models (MLLMs) are increasingly used to interpret visualizations, yet little is known about why they fail. We present the first systematic analysis of barriers to visualization literacy in MLLMs. Using the…

Human-Computer Interaction · Computer Science 2026-01-21 Mengli , Duan , Yuhe , Jiang , Matthew Varona , Carolina Nobre

Benchmarking and Evaluating VLMs for Software Architecture Diagram Understanding

Software architecture diagrams are important design artifacts for communicating system structure, behavior, and data organization throughout the software development lifecycle. Although recent progress in large language models has…

Software Engineering · Computer Science 2026-04-07 Shuyin Ouyang , Jie M. Zhang , Jingzhi Gong , Gunel Jahangirova , Mohammad Reza Mousavi , Jack Johns , Beum Seuk Lee , Adam Ziolkowski , Botond Virginas , Joost Noppen

Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and…

Computer Vision and Pattern Recognition · Computer Science 2024-05-31 Siddharth Karamcheti , Suraj Nair , Ashwin Balakrishna , Percy Liang , Thomas Kollar , Dorsa Sadigh

Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking

Current benchmarks for evaluating Vision Language Models (VLMs) often fall short in thoroughly assessing model abilities to understand and process complex visual and textual content. They typically focus on simple tasks that do not require…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Harsha Vardhan Khurdula , Basem Rizk , Indus Khaitan , Janit Anjaria , Aviral Srivastava , Rajvardhan Khaitan