Related papers: Visual Programmability: A Guide for Code-as-Though…

CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images

Recent advances in Large Language Models (LLMs) and Vision Language Models (VLMs) have shown significant progress in mathematical reasoning, yet they still face a critical bottleneck with problems requiring visual assistance, such as…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Chengqi Duan , Kaiyue Sun , Rongyao Fang , Manyuan Zhang , Yan Feng , Ying Luo , Yufang Liu , Ke Wang , Peng Pei , Xunliang Cai , Hongsheng Li , Yi Ma , Xihui Liu

Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner

Chart reasoning presents unique challenges due to its inherent complexity -- requiring precise numerical comprehension, multi-level visual understanding, and logical inference across interconnected data elements. Existing vision-language…

Artificial Intelligence · Computer Science 2026-03-17 Lei Chen , Xuanle Zhao , Zhixiong Zeng , Jing Huang , Yufeng Zhong , Lin Ma

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

Solving complex chart Q&A tasks requires advanced visual reasoning abilities in multimodal large language models (MLLMs), including recognizing key information from visual inputs and conducting reasoning over it. While fine-tuning MLLMs for…

Computation and Language · Computer Science 2025-09-03 Wei He , Zhiheng Xi , Wanxu Zhao , Xiaoran Fan , Yiwen Ding , Zifei Shan , Tao Gui , Qi Zhang , Xuanjing Huang

VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought

Large vision-language models (LVLMs) struggle to reliably detect visual primitives in charts and align them with semantic representations, which severely limits their performance on complex visual reasoning. This lack of perceptual…

Artificial Intelligence · Computer Science 2026-03-13 Eunsoo Lee , Jeongwoo Lee , Minki Hong , Jangho Choi , Jihie Kim

Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models

The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension,…

Artificial Intelligence · Computer Science 2026-04-06 Yunfei Bai , Amit Dhanda , Shekhar Jain

Do Vision-Language Models See Visualizations Like Humans? Alignment in Chart Categorization

Vision-language models (VLMs) hold promise for enhancing visualization tools, but effective human-AI collaboration hinges on a shared perceptual understanding of visual content. Prior studies assessed VLM visualization literacy through…

Human-Computer Interaction · Computer Science 2025-11-10 Péter Ferenc Gyarmati , Manfred Klaffenböck , Laura Koesten , Torsten Möller

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought

Chain-of-Thought (CoT) prompting has proven highly effective for enhancing complex reasoning in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). Yet, it struggles in complex spatial reasoning tasks. Nonetheless,…

Computation and Language · Computer Science 2025-01-14 Chengzu Li , Wenshan Wu , Huanyu Zhang , Yan Xia , Shaoguang Mao , Li Dong , Ivan Vulić , Furu Wei

Look as You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning

Aiming to identify precise evidence sources from visual documents, visual evidence attribution for visual document retrieval-augmented generation (VD-RAG) ensures reliable and verifiable predictions from vision-language models (VLMs) in…

Artificial Intelligence · Computer Science 2025-12-02 Shuochen Liu , Pengfei Luo , Chao Zhang , Yuhao Chen , Haotian Zhang , Qi Liu , Xin Kou , Tong Xu , Enhong Chen

Improve Vision Language Model Chain-of-thought Reasoning

Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. However, current training recipes lack robust CoT reasoning data, relying on datasets dominated by short…

Artificial Intelligence · Computer Science 2024-10-22 Ruohong Zhang , Bowen Zhang , Yanghao Li , Haotian Zhang , Zhiqing Sun , Zhe Gan , Yinfei Yang , Ruoming Pang , Yiming Yang

Chart-CoCa: Self-Improving Chart Understanding of Vision LMs via Code-Driven Synthesis and Candidate-Conditioned Answering

Vision Language Models (VLMs) often struggle with chart understanding tasks, particularly in accurate chart description and complex reasoning. Synthetic data generation is a promising solution, while usually facing the challenge of noise…

Artificial Intelligence · Computer Science 2025-08-19 Gongyao Jiang , Qiong Luo

Chart-RVR: Reinforcement Learning with Verifiable Rewards for Explainable Chart Reasoning

The capabilities of Large Vision-Language Models (LVLMs) have reached state-of-the-art on many visual reasoning tasks, including chart reasoning, yet they still falter on out-of-distribution (OOD) data, and degrade further when asked to…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Sanchit Sinha , Oana Frunza , Kashif Rasul , Yuriy Nevmyvaka , Aidong Zhang

LaV-CoT: Language-Aware Visual CoT with Multi-Aspect Reward Optimization for Real-World Multilingual VQA

As large vision language models (VLMs) advance, their capabilities in multilingual visual question answering (mVQA) have significantly improved. Chain-of-thought (CoT) reasoning has been proven to enhance interpretability and complex…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Jing Huang , Zhiya Tan , Shutao Gong , Fanwei Zeng , Joey Tianyi Zhou , Changtao Miao , Huazhe Tan , Weibin Yao , Jianshu Li

Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

Reasoning capability is pivotal for Large Language Models (LLMs) to solve complex tasks, yet achieving reliable and scalable reasoning remains challenging. While Chain-of-Thought (CoT) prompting has become a mainstream approach, existing…

Computation and Language · Computer Science 2025-10-07 Honglin Lin , Qizhi Pei , Xin Gao , Zhuoshi Pan , Yu Li , Juntao Li , Conghui He , Lijun Wu

Latent Chain-of-Thought for Visual Reasoning

Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs). However, existing training algorithms such as SFT, PPO, and GRPO may not generalize well across unseen…

Artificial Intelligence · Computer Science 2025-10-31 Guohao Sun , Hang Hua , Jian Wang , Jiebo Luo , Sohail Dianat , Majid Rabbani , Raghuveer Rao , Zhiqiang Tao

Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards

Accurate chart comprehension represents a critical challenge in advancing multimodal learning systems, as extensive information is compressed into structured visual representations. However, existing vision-language models (VLMs) frequently…

Machine Learning · Computer Science 2026-03-10 Xin Zhang , Xingyu Li , Rongguang Wang , Ruizhong Miao , Zheng Wang , Dan Roth , Chenyang Li

Concept-as-Tree: A Controllable Synthetic Data Framework Makes Stronger Personalized VLMs

Vision-Language Models (VLMs) have demonstrated exceptional performance in various multi-modal tasks. Recently, there has been an increasing interest in improving the personalization capabilities of VLMs. To better integrate user-provided…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Ruichuan An , Kai Zeng , Ming Lu , Sihan Yang , Renrui Zhang , Huitong Ji , Hao Liang , Wentao Zhang

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

This work explores enabling Chain-of-Thought (CoT) reasoning to link visual cues across multiple images. A straightforward solution is to adapt rule-based reinforcement learning for Vision-Language Models (VLMs). However, such methods…

Computer Vision and Pattern Recognition · Computer Science 2025-06-30 Xi Chen , Mingkang Zhu , Shaoteng Liu , Xiaoyang Wu , Xiaogang Xu , Yu Liu , Xiang Bai , Hengshuang Zhao

ChartAgent: A Multimodal Agent for Visually Grounded Reasoning in Complex Chart Question Answering

Recent multimodal LLMs have shown promise in chart-based visual question answering, but their performance declines sharply on unannotated charts-those requiring precise visual interpretation rather than relying on textual shortcuts. To…

Artificial Intelligence · Computer Science 2026-01-08 Rachneet Kaur , Nishan Srishankar , Zhen Zeng , Sumitra Ganesh , Manuela Veloso

End-to-End Chart Summarization via Visual Chain-of-Thought in Vision-Language Models

Automated chart summarization is crucial for enhancing data accessibility and enabling efficient information extraction from visual data. While recent advances in visual-language models (VLMs) have demonstrated promise, existing methods…

Computation and Language · Computer Science 2025-02-26 Raymond Choi , Frank Burns , Chase Lawrence

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models

Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can parse natural queries about the visual content and generate human-like outputs. In this work, we explore the ability of these models to…

Computation and Language · Computer Science 2024-03-21 Yangyi Chen , Karan Sikka , Michael Cogswell , Heng Ji , Ajay Divakaran