English
Related papers

Related papers: Visual Programmability: A Guide for Code-as-Though…

200 papers

Recent advances in Large Language Models (LLMs) and Vision Language Models (VLMs) have shown significant progress in mathematical reasoning, yet they still face a critical bottleneck with problems requiring visual assistance, such as…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Chengqi Duan , Kaiyue Sun , Rongyao Fang , Manyuan Zhang , Yan Feng , Ying Luo , Yufang Liu , Ke Wang , Peng Pei , Xunliang Cai , Hongsheng Li , Yi Ma , Xihui Liu

Chart reasoning presents unique challenges due to its inherent complexity -- requiring precise numerical comprehension, multi-level visual understanding, and logical inference across interconnected data elements. Existing vision-language…

Artificial Intelligence · Computer Science 2026-03-17 Lei Chen , Xuanle Zhao , Zhixiong Zeng , Jing Huang , Yufeng Zhong , Lin Ma

Solving complex chart Q&A tasks requires advanced visual reasoning abilities in multimodal large language models (MLLMs), including recognizing key information from visual inputs and conducting reasoning over it. While fine-tuning MLLMs for…

Computation and Language · Computer Science 2025-09-03 Wei He , Zhiheng Xi , Wanxu Zhao , Xiaoran Fan , Yiwen Ding , Zifei Shan , Tao Gui , Qi Zhang , Xuanjing Huang

Large vision-language models (LVLMs) struggle to reliably detect visual primitives in charts and align them with semantic representations, which severely limits their performance on complex visual reasoning. This lack of perceptual…

Artificial Intelligence · Computer Science 2026-03-13 Eunsoo Lee , Jeongwoo Lee , Minki Hong , Jangho Choi , Jihie Kim

The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension,…

Artificial Intelligence · Computer Science 2026-04-06 Yunfei Bai , Amit Dhanda , Shekhar Jain

Vision-language models (VLMs) hold promise for enhancing visualization tools, but effective human-AI collaboration hinges on a shared perceptual understanding of visual content. Prior studies assessed VLM visualization literacy through…

Human-Computer Interaction · Computer Science 2025-11-10 Péter Ferenc Gyarmati , Manfred Klaffenböck , Laura Koesten , Torsten Möller

Chain-of-Thought (CoT) prompting has proven highly effective for enhancing complex reasoning in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). Yet, it struggles in complex spatial reasoning tasks. Nonetheless,…

Computation and Language · Computer Science 2025-01-14 Chengzu Li , Wenshan Wu , Huanyu Zhang , Yan Xia , Shaoguang Mao , Li Dong , Ivan Vulić , Furu Wei

Aiming to identify precise evidence sources from visual documents, visual evidence attribution for visual document retrieval-augmented generation (VD-RAG) ensures reliable and verifiable predictions from vision-language models (VLMs) in…

Artificial Intelligence · Computer Science 2025-12-02 Shuochen Liu , Pengfei Luo , Chao Zhang , Yuhao Chen , Haotian Zhang , Qi Liu , Xin Kou , Tong Xu , Enhong Chen

Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. However, current training recipes lack robust CoT reasoning data, relying on datasets dominated by short…

Artificial Intelligence · Computer Science 2024-10-22 Ruohong Zhang , Bowen Zhang , Yanghao Li , Haotian Zhang , Zhiqing Sun , Zhe Gan , Yinfei Yang , Ruoming Pang , Yiming Yang

Vision Language Models (VLMs) often struggle with chart understanding tasks, particularly in accurate chart description and complex reasoning. Synthetic data generation is a promising solution, while usually facing the challenge of noise…

Artificial Intelligence · Computer Science 2025-08-19 Gongyao Jiang , Qiong Luo

The capabilities of Large Vision-Language Models (LVLMs) have reached state-of-the-art on many visual reasoning tasks, including chart reasoning, yet they still falter on out-of-distribution (OOD) data, and degrade further when asked to…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Sanchit Sinha , Oana Frunza , Kashif Rasul , Yuriy Nevmyvaka , Aidong Zhang

As large vision language models (VLMs) advance, their capabilities in multilingual visual question answering (mVQA) have significantly improved. Chain-of-thought (CoT) reasoning has been proven to enhance interpretability and complex…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Jing Huang , Zhiya Tan , Shutao Gong , Fanwei Zeng , Joey Tianyi Zhou , Changtao Miao , Huazhe Tan , Weibin Yao , Jianshu Li

Reasoning capability is pivotal for Large Language Models (LLMs) to solve complex tasks, yet achieving reliable and scalable reasoning remains challenging. While Chain-of-Thought (CoT) prompting has become a mainstream approach, existing…

Computation and Language · Computer Science 2025-10-07 Honglin Lin , Qizhi Pei , Xin Gao , Zhuoshi Pan , Yu Li , Juntao Li , Conghui He , Lijun Wu

Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs). However, existing training algorithms such as SFT, PPO, and GRPO may not generalize well across unseen…

Artificial Intelligence · Computer Science 2025-10-31 Guohao Sun , Hang Hua , Jian Wang , Jiebo Luo , Sohail Dianat , Majid Rabbani , Raghuveer Rao , Zhiqiang Tao

Accurate chart comprehension represents a critical challenge in advancing multimodal learning systems, as extensive information is compressed into structured visual representations. However, existing vision-language models (VLMs) frequently…

Machine Learning · Computer Science 2026-03-10 Xin Zhang , Xingyu Li , Rongguang Wang , Ruizhong Miao , Zheng Wang , Dan Roth , Chenyang Li

Vision-Language Models (VLMs) have demonstrated exceptional performance in various multi-modal tasks. Recently, there has been an increasing interest in improving the personalization capabilities of VLMs. To better integrate user-provided…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Ruichuan An , Kai Zeng , Ming Lu , Sihan Yang , Renrui Zhang , Huitong Ji , Hao Liang , Wentao Zhang

This work explores enabling Chain-of-Thought (CoT) reasoning to link visual cues across multiple images. A straightforward solution is to adapt rule-based reinforcement learning for Vision-Language Models (VLMs). However, such methods…

Computer Vision and Pattern Recognition · Computer Science 2025-06-30 Xi Chen , Mingkang Zhu , Shaoteng Liu , Xiaoyang Wu , Xiaogang Xu , Yu Liu , Xiang Bai , Hengshuang Zhao

Recent multimodal LLMs have shown promise in chart-based visual question answering, but their performance declines sharply on unannotated charts-those requiring precise visual interpretation rather than relying on textual shortcuts. To…

Artificial Intelligence · Computer Science 2026-01-08 Rachneet Kaur , Nishan Srishankar , Zhen Zeng , Sumitra Ganesh , Manuela Veloso

Automated chart summarization is crucial for enhancing data accessibility and enabling efficient information extraction from visual data. While recent advances in visual-language models (VLMs) have demonstrated promise, existing methods…

Computation and Language · Computer Science 2025-02-26 Raymond Choi , Frank Burns , Chase Lawrence

Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can parse natural queries about the visual content and generate human-like outputs. In this work, we explore the ability of these models to…

Computation and Language · Computer Science 2024-03-21 Yangyi Chen , Karan Sikka , Michael Cogswell , Heng Ji , Ajay Divakaran
‹ Prev 1 2 3 10 Next ›