Related papers: Improved Iterative Refinement for Chart-to-Code Ge…

Boosting Chart-to-Code Generation in MLLM via Dual Preference-Guided Refinement

Translating chart images into executable plotting scripts-referred to as the chart-to-code generation task-requires Multimodal Large Language Models (MLLMs) to perform fine-grained visual parsing, precise code synthesis, and robust…

Computation and Language · Computer Science 2025-08-21 Zhihan Zhang , Yixin Cao , Lizi Liao

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in chart understanding tasks. However, interpreting charts with textual descriptions often leads to information loss, as it fails to fully capture the dense…

Artificial Intelligence · Computer Science 2025-07-03 Xuanle Zhao , Xianzhen Luo , Qi Shi , Chi Chen , Shuo Wang , Zhiyuan Liu , Maosong Sun

ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning

Charts provide visual representations of data and are widely used for analyzing information, addressing queries, and conveying insights to others. Various chart-related downstream tasks have emerged recently, such as question-answering and…

Computation and Language · Computer Science 2024-03-15 Ahmed Masry , Mehrad Shahmohammadi , Md Rizwan Parvez , Enamul Hoque , Shafiq Joty

MM-ReCoder: Advancing Chart-to-Code Generation with Reinforcement Learning and Self-Correction

Multimodal Large Language Models (MLLMs) have recently demonstrated promising capabilities in multimodal coding tasks such as chart-to-code generation. However, existing methods primarily rely on supervised fine-tuning (SFT), which requires…

Artificial Intelligence · Computer Science 2026-04-03 Zitian Tang , Xu Zhang , Jianbo Yuan , Yang Zou , Varad Gunjal , Songyao Jiang , Davide Modolo

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

Multi-modal large language models have demonstrated impressive performances on most vision-language tasks. However, the model generally lacks the understanding capabilities for specific domain data, particularly when it comes to…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Yucheng Han , Chi Zhang , Xin Chen , Xu Yang , Zhibin Wang , Gang Yu , Bin Fu , Hanwang Zhang

Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation

Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on supervised fine-tuning, encouraging surface-level…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Minggui He , Mingchen Dai , Jian Zhang , Yilun Liu , Shimin Tao , Pufan Zeng , Osamu Yoshie , Yuya Ieiri

Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning

Emerging multimodal large language models (MLLMs) exhibit great potential for chart question answering (CQA). Recent efforts primarily focus on scaling up training datasets (i.e., charts, data tables, and question-answer (QA) pairs) through…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Xingchen Zeng , Haichuan Lin , Yilin Ye , Wei Zeng

On Pre-training of Multimodal Language Models Customized for Chart Understanding

Recent studies customizing Multimodal Large Language Models (MLLMs) for domain-specific tasks have yielded promising results, especially in the field of scientific chart comprehension. These studies generally utilize visual instruction…

Computer Vision and Pattern Recognition · Computer Science 2025-07-21 Wan-Cyuan Fan , Yen-Chun Chen , Mengchen Liu , Lu Yuan , Leonid Sigal

ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering

Recently, large language models have shown remarkable reasoning capabilities through long-chain reasoning before responding. However, how to extend this capability to visual reasoning tasks remains an open challenge. Existing multimodal…

Computation and Language · Computer Science 2025-06-13 Caijun Jia , Nan Xu , Jingxuan Wei , Qingli Wang , Lei Wang , Bihui Yu , Junnan Zhu

ChartAdapter: Large Vision-Language Model for Chart Summarization

Chart summarization, which focuses on extracting key information from charts and interpreting it in natural language, is crucial for generating and delivering insights through effective and accessible data analysis. Traditional methods for…

Multimedia · Computer Science 2024-12-31 Peixin Xu , Yujuan Ding , Wenqi Fan

ChartCards: A Chart-Metadata Generation Framework for Multi-Task Chart Understanding

The emergence of Multi-modal Large Language Models (MLLMs) presents new opportunities for chart understanding. However, due to the fine-grained nature of these tasks, applying MLLMs typically requires large, high-quality datasets for…

Computation and Language · Computer Science 2025-10-08 Yifan Wu , Lutao Yan , Leixian Shen , Yinan Mei , Jiannan Wang , Yuyu Luo

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the…

Machine Learning · Computer Science 2023-09-12 Zhiyuan Zhao , Linke Ouyang , Bin Wang , Siyuan Huang , Pan Zhang , Xiaoyi Dong , Jiaqi Wang , Conghui He

Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation

While reinforcement learning (RL) has proven highly effective for general reasoning in vision-language models, its application to tasks requiring deep understanding of information-rich images and structured output generation remains…

Artificial Intelligence · Computer Science 2026-03-17 Lei Chen , Xuanle Zhao , Zhixiong Zeng , Jing Huang , Liming Zheng , Yufeng Zhong , Lin Ma

Do MLLMs Really Understand the Charts?

Although Multimodal Large Language Models (MLLMs) have demonstrated increasingly impressive performance in chart understanding, most of them exhibit alarming hallucinations and significant performance degradation when handling non-annotated…

Computation and Language · Computer Science 2025-12-16 Xiao Zhang , Dongyuan Li , Liuyu Xiang , Yao Zhang , Cheng Zhong , Zhaofeng He

ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation

Chart-to-code reconstruction -- the task of recovering executable plotting scripts from chart images -- provides important insights into a model's ability to ground data visualizations in precise, machine-readable form. Yet many existing…

Human-Computer Interaction · Computer Science 2025-07-29 Jovana Kondic , Pengyuan Li , Dhiraj Joshi , Zexue He , Shafiq Abedin , Jennifer Sun , Ben Wiesel , Eli Schwartz , Ahmed Nassar , Bo Wu , Assaf Arbelle , Aude Oliva , Dan Gutfreund , Leonid Karlinsky , Rogerio Feris

CharTide: Data-Centric Chart-to-Code Generation via Tri-Perspective Tuning and Inquiry-Driven Evolution

Chart-to-code generation demands strict visual precision and syntactic correctness from Vision-Language Models (VLMs). However, existing approaches are fundamentally constrained by data-centric limitations: despite the availability of…

Computer Vision and Pattern Recognition · Computer Science 2026-04-27 Xiangxi Zheng , Kuang He , Jiayi Hu , Ping Yu , Rui Yan , Yuan Yao , Peng Hou , Anxiang Zeng , Alex Jinpeng Wang

ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering

Chart question answering (ChartQA) tasks play a critical role in interpreting and extracting insights from visualization charts. While recent advancements in multimodal large language models (MLLMs) like GPT-4o have shown promise in…

Computation and Language · Computer Science 2024-11-07 Yifan Wu , Lutao Yan , Leixian Shen , Yunhai Wang , Nan Tang , Yuyu Luo

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

Solving complex chart Q&A tasks requires advanced visual reasoning abilities in multimodal large language models (MLLMs), including recognizing key information from visual inputs and conducting reasoning over it. While fine-tuning MLLMs for…

Computation and Language · Computer Science 2025-09-03 Wei He , Zhiheng Xi , Wanxu Zhao , Xiaoran Fan , Yiwen Ding , Zifei Shan , Tao Gui , Qi Zhang , Xuanjing Huang

Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement

Chart understanding is a quintessential information fusion task, requiring the seamless integration of graphical and textual data to extract meaning. The advent of Multimodal Large Language Models (MLLMs) has revolutionized this domain, yet…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Zhihang Yi , Jian Zhao , Jiancheng Lv , Tao Wang

Chart-RL: Generalized Chart Comprehension via Reinforcement Learning with Verifiable Rewards

Accurate chart comprehension represents a critical challenge in advancing multimodal learning systems, as extensive information is compressed into structured visual representations. However, existing vision-language models (VLMs) frequently…

Machine Learning · Computer Science 2026-03-10 Xin Zhang , Xingyu Li , Rongguang Wang , Ruizhong Miao , Zheng Wang , Dan Roth , Chenyang Li