English
Related papers

Related papers: Benchmarking Visual Language Models on Standardize…

200 papers

In this paper, we assess the visualization literacy of two prominent Large Language Models (LLMs): OpenAI's Generative Pretrained Transformers (GPT), the backend of ChatGPT, and Google's Gemini, previously known as Bard, to establish…

Performance · Computer Science 2025-01-28 Jiayi Hong , Christian Seto , Arlen Fan , Ross Maciejewski

Vision-language models (VLMs) hold promise for enhancing visualization tools, but effective human-AI collaboration hinges on a shared perceptual understanding of visual content. Prior studies assessed VLM visualization literacy through…

Human-Computer Interaction · Computer Science 2025-11-10 Péter Ferenc Gyarmati , Manfred Klaffenböck , Laura Koesten , Torsten Möller

The recent introduction of multimodal large language models (MLLMs) combine the inherent power of large language models (LLMs) with the renewed capabilities to reason about the multimodal context. The potential usage scenarios for MLLMs…

Computation and Language · Computer Science 2024-07-17 Zhimin Li , Haichao Miao , Valerio Pascucci , Shusen Liu

Natural language is a powerful complementary modality of communication for data visualizations, such as bar and line charts. To facilitate chart-based reasoning using natural language, various downstream tasks have been introduced recently…

Computation and Language · Computer Science 2024-10-07 Mohammed Saidul Islam , Raian Rahman , Ahmed Masry , Md Tahmid Rahman Laskar , Mir Tafseer Nayeem , Enamul Hoque

Vision Language Models (VLMs) demonstrate promising chart comprehension capabilities. Yet, prior explorations of their visualization literacy have been limited to assessing their response correctness and fail to explore their internal…

Human-Computer Interaction · Computer Science 2025-04-09 Lianghan Dong , Anamaria Crisan

Multimodal Vision Language Models (VLMs) have emerged as a transformative topic at the intersection of computer vision and natural language processing, enabling machines to perceive and reason about the world through both visual and textual…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Zongxia Li , Xiyang Wu , Hongyang Du , Fuxiao Liu , Huy Nghiem , Guangyao Shi

While large language models with vision capabilities (VLMs), e.g., GPT-4o and Gemini 1.5 Pro, score high on many vision-understanding benchmarks, they are still struggling with low-level vision tasks that are easy to humans. Specifically,…

Artificial Intelligence · Computer Science 2025-03-28 Pooyan Rahmanzadehgervi , Logan Bolton , Mohammad Reza Taesiri , Anh Totti Nguyen

Chart understanding presents a unique challenge for large vision-language models (LVLMs), as it requires the integration of sophisticated textual and visual reasoning capabilities. However, current LVLMs exhibit a notable imbalance between…

Visualization literacy is an essential skill for accurately interpreting data to inform critical decisions. Consequently, it is vital to understand the evolution of this ability and devise targeted interventions to enhance it, requiring…

Human-Computer Interaction · Computer Science 2023-08-29 Yuan Cui , Lily W. Ge , Yiren Ding , Fumeng Yang , Lane Harrison , Matthew Kay

This paper evaluates the visualization literacy of modern Large Language Models (LLMs) and introduces a novel prompting technique called Charts-of-Thought. We tested three state-of-the-art LLMs (Claude-3.7-sonnet, GPT-4.5 preview, and…

Human-Computer Interaction · Computer Science 2025-12-11 Amit Kumar Das , Mohammad Tarun , Klaus Mueller

Vision-Language Models (VLMs) excel at complex visual tasks such as VQA and chart understanding, yet recent work suggests they struggle with simple perceptual tests. We present an evaluation of vision-language models' capacity for nonlocal…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Shmuel Berman , Jia Deng

Multimodal Large Language Models (MLLMs) can interpret data visualizations, but what makes a visualization understandable to these models? Do factors like color, shape, and text influence legibility, and how does this compare to human…

Human-Computer Interaction · Computer Science 2025-04-04 Matheus Valentim , Vaishali Dhanoa , Gabriela Molina León , Niklas Elmqvist

This paper introduces an open-source benchmark for evaluating Vision-Language Models (VLMs) on Optical Character Recognition (OCR) tasks in dynamic video environments. We present a curated dataset containing 1,477 manually annotated frames…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Sankalp Nagaonkar , Augustya Sharma , Ashish Choithani , Ashutosh Trivedi

We present a simple experiment that exposes a fundamental limitation in vision-language models (VLMs): the inability to accurately localize filled cells in binary grids when those cells lack textual identity. We generate fifteen 15x15 grids…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Yuval Levental

The visualization community regards visualization literacy as a necessary skill. Yet, despite the recent increase in research into visualization literacy by the education and visualization communities, we lack practical and time-effective…

Human-Computer Interaction · Computer Science 2023-08-09 Saugat Pandey , Alvitta Ottley

Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of visual and linguistic data. However, their performance has been typically assessed on general scene…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Shravan Nayak , Kanishk Jain , Rabiul Awal , Siva Reddy , Sjoerd van Steenkiste , Lisa Anne Hendricks , Karolina Stańczak , Aishwarya Agrawal

Multimodal Large Language Models (MLLMs) are increasingly used to interpret visualizations, yet little is known about why they fail. We present the first systematic analysis of barriers to visualization literacy in MLLMs. Using the…

Human-Computer Interaction · Computer Science 2026-01-21 Mengli , Duan , Yuhe , Jiang , Matthew Varona , Carolina Nobre

Software architecture diagrams are important design artifacts for communicating system structure, behavior, and data organization throughout the software development lifecycle. Although recent progress in large language models has…

Visually-conditioned language models (VLMs) have seen growing adoption in applications such as visual dialogue, scene understanding, and robotic task planning; adoption that has fueled a wealth of new models such as LLaVa, InstructBLIP, and…

Computer Vision and Pattern Recognition · Computer Science 2024-05-31 Siddharth Karamcheti , Suraj Nair , Ashwin Balakrishna , Percy Liang , Thomas Kollar , Dorsa Sadigh

Current benchmarks for evaluating Vision Language Models (VLMs) often fall short in thoroughly assessing model abilities to understand and process complex visual and textual content. They typically focus on simple tasks that do not require…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Harsha Vardhan Khurdula , Basem Rizk , Indus Khaitan , Janit Anjaria , Aviral Srivastava , Rajvardhan Khaitan
‹ Prev 1 2 3 10 Next ›