English
Related papers

Related papers: Benchmarking Visual Language Models on Standardize…

200 papers

Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art…

Computation and Language · Computer Science 2024-10-07 Srija Mukhopadhyay , Adnan Qidwai , Aparna Garimella , Pritika Ramu , Vivek Gupta , Dan Roth

The advent of Vision Language Models (VLM) has allowed researchers to investigate the visual understanding of a neural network using natural language. Beyond object classification and detection, VLMs are capable of visual comprehension and…

Computer Vision and Pattern Recognition · Computer Science 2024-08-12 Haz Sameen Shahgir , Khondker Salman Sayeed , Abhik Bhattacharjee , Wasi Uddin Ahmad , Yue Dong , Rifat Shahriyar

Vision Language Models (VLMs) are pivotal for advancing perception in intelligent agents. Yet, evaluation of VLMs remains limited to predominantly English-centric benchmarks in which the image-text pairs comprise short texts. To evaluate…

Computation and Language · Computer Science 2025-10-16 Jesse Atuhurra , Iqra Ali , Tomoya Iwakura , Hidetaka Kamigaito , Tatsuya Hiraoka

Vision language models (VLMs) show strong results on chart understanding, yet existing benchmarks assume clean figures and fact based queries. Real world charts often contain distortions and demand reasoning beyond simple matching. We…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Philip Wootaek Shin , Jack Sampson , Vijaykrishnan Narayanan , Andres Marquez , Mahantesh Halappanavar

Large Vision-Language Models (LVLMs) have demonstrated remarkable performance across diverse tasks. Despite great success, recent studies show that LVLMs encounter substantial limitations when engaging with visual graphs. To study the…

Computation and Language · Computer Science 2025-06-09 Yingjie Zhu , Xuefeng Bai , Kehai Chen , Yang Xiang , Jun Yu , Min Zhang

Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity. Furthermore, they differ in…

Computer Vision and Pattern Recognition · Computer Science 2024-10-25 Tony Lee , Haoqin Tu , Chi Heem Wong , Wenhao Zheng , Yiyang Zhou , Yifan Mai , Josselin Somerville Roberts , Michihiro Yasunaga , Huaxiu Yao , Cihang Xie , Percy Liang

Vision-Language Models like GPT-4, LLaVA, and CogVLM have surged in popularity recently due to their impressive performance in several vision-language tasks. Current evaluation methods, however, overlook an essential component: uncertainty,…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Vasily Kostumov , Bulat Nutfullin , Oleg Pilipenko , Eugene Ilyushin

Large language models (LLMs) have increased interest in vision language models (VLMs), which process image-text pairs as input. Studies investigating the visual understanding ability of VLMs have been proposed, but such studies are still…

Computation and Language · Computer Science 2024-06-25 Jesse Atuhurra , Iqra Ali , Tatsuya Hiraoka , Hidetaka Kamigaito , Tomoya Iwakura , Taro Watanabe

While Vision-Language Models (VLMs) have achieved competitive performance in various tasks, their comprehension of the underlying structure and semantics of a scene remains understudied. To investigate the understanding of VLMs, we study…

Computer Vision and Pattern Recognition · Computer Science 2025-06-23 Massimo Rizzoli , Simone Alghisi , Olha Khomyn , Gabriel Roccabruna , Seyed Mahed Mousavi , Giuseppe Riccardi

Visual language is a system of communication that conveys information through symbols, shapes, and spatial arrangements. Diagrams are a typical example of a visual language depicting complex concepts and their relationships in the form of…

Computation and Language · Computer Science 2025-05-27 Yifan Hou , Buse Giledereli , Yilei Tu , Mrinmaya Sachan

Visual Language Models (VLMs) show remarkable performance in visual reasoning tasks, successfully tackling college-level challenges that require high-level understanding of images. However, some recent reports of VLMs struggling to reason…

Computer Vision and Pattern Recognition · Computer Science 2025-04-17 Gene Tangtartharakul , Katherine R. Storrs

Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial. This paper aims to evaluate the ability of…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Tobias Groot , Matias Valdenegro-Toro

Data visualization principles, derived from decades of research in design and perception, ensure proper visual communication. While prior work has shown that large language models (LLMs) can generate charts or flag misleading figures, it…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Martin Sinnona , Valentin Bonas , Viviana Siless , Emmanuel Iarussi

We introduce VisualQuest, a novel dataset designed to rigorously evaluate multimodal large language models (MLLMs) on abstract visual reasoning tasks that require the integration of symbolic, cultural, and linguistic knowledge. Unlike…

Computer Vision and Pattern Recognition · Computer Science 2026-01-05 Kelaiti Xiao , Liang Yang , Dongyu Zhang , Paerhati Tulajiang , Hongfei Lin

Information visualizations are powerful tools that help users quickly identify patterns, trends, and outliers, facilitating informed decision-making. However, when visualizations incorporate deceptive design elements-such as truncated or…

Computation and Language · Computer Science 2025-08-14 Ridwan Mahbub , Mohammed Saidul Islam , Md Tahmid Rahman Laskar , Mizanur Rahman , Mir Tafseer Nayeem , Enamul Hoque

This paper presents novel benchmarks for evaluating vision-language models (VLMs) in zero-shot recognition, focusing on granularity and specificity. Although VLMs excel in tasks like image captioning, they face challenges in open-world…

Computer Vision and Pattern Recognition · Computer Science 2024-06-19 Zhenlin Xu , Yi Zhu , Tiffany Deng , Abhay Mittal , Yanbei Chen , Manchen Wang , Paolo Favaro , Joseph Tighe , Davide Modolo

Charts are ubiquitous as they help people understand and reason with data. Recently, various downstream tasks, such as chart question answering, chart2text, and fact-checking, have emerged. Large Vision-Language Models (LVLMs) show promise…

Vision-Language Models (VLMs) have achieved impressive performance in cross-modal understanding across textual and visual inputs, yet existing benchmarks predominantly focus on pure-text queries. In real-world scenarios, language also…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Qing'an Liu , Juntong Feng , Yuhao Wang , Xinzhe Han , Yujie Cheng , Yue Zhu , Haiwen Diao , Yunzhi Zhuge , Huchuan Lu

The growing sophistication of deepfakes presents substantial challenges to the integrity of media and the preservation of public trust. Concurrently, vision-language models (VLMs), large language models enhanced with visual reasoning…

Computer Vision and Pattern Recognition · Computer Science 2025-06-13 Shahroz Tariq , David Nguyen , M. A. P. Chamikara , Tingmin Wu , Alsharif Abuadbba , Kristen Moore

Vision-language models (VLMs) perform well on many document understanding tasks, yet their reliability in specialized, non-English domains remains underexplored. This gap is especially critical in finance, where documents mix dense…

Computation and Language · Computer Science 2026-03-17 Virginie Mouilleron , Théo Lasnier , Anna Mosolova , Djamé Seddah