English
Related papers

Related papers: Improved Iterative Refinement for Chart-to-Code Ge…

200 papers

Translating chart images into executable plotting scripts-referred to as the chart-to-code generation task-requires Multimodal Large Language Models (MLLMs) to perform fine-grained visual parsing, precise code synthesis, and robust…

Computation and Language · Computer Science 2025-08-21 Zhihan Zhang , Yixin Cao , Lizi Liao

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in chart understanding tasks. However, interpreting charts with textual descriptions often leads to information loss, as it fails to fully capture the dense…

Artificial Intelligence · Computer Science 2025-07-03 Xuanle Zhao , Xianzhen Luo , Qi Shi , Chi Chen , Shuo Wang , Zhiyuan Liu , Maosong Sun

Charts provide visual representations of data and are widely used for analyzing information, addressing queries, and conveying insights to others. Various chart-related downstream tasks have emerged recently, such as question-answering and…

Computation and Language · Computer Science 2024-03-15 Ahmed Masry , Mehrad Shahmohammadi , Md Rizwan Parvez , Enamul Hoque , Shafiq Joty

Multimodal Large Language Models (MLLMs) have recently demonstrated promising capabilities in multimodal coding tasks such as chart-to-code generation. However, existing methods primarily rely on supervised fine-tuning (SFT), which requires…

Artificial Intelligence · Computer Science 2026-04-03 Zitian Tang , Xu Zhang , Jianbo Yuan , Yang Zou , Varad Gunjal , Songyao Jiang , Davide Modolo

Multi-modal large language models have demonstrated impressive performances on most vision-language tasks. However, the model generally lacks the understanding capabilities for specific domain data, particularly when it comes to…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Yucheng Han , Chi Zhang , Xin Chen , Xu Yang , Zhibin Wang , Gang Yu , Bin Fu , Hanwang Zhang

Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on supervised fine-tuning, encouraging surface-level…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Minggui He , Mingchen Dai , Jian Zhang , Yilun Liu , Shimin Tao , Pufan Zeng , Osamu Yoshie , Yuya Ieiri

Emerging multimodal large language models (MLLMs) exhibit great potential for chart question answering (CQA). Recent efforts primarily focus on scaling up training datasets (i.e., charts, data tables, and question-answer (QA) pairs) through…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Xingchen Zeng , Haichuan Lin , Yilin Ye , Wei Zeng

Recent studies customizing Multimodal Large Language Models (MLLMs) for domain-specific tasks have yielded promising results, especially in the field of scientific chart comprehension. These studies generally utilize visual instruction…

Computer Vision and Pattern Recognition · Computer Science 2025-07-21 Wan-Cyuan Fan , Yen-Chun Chen , Mengchen Liu , Lu Yuan , Leonid Sigal

Recently, large language models have shown remarkable reasoning capabilities through long-chain reasoning before responding. However, how to extend this capability to visual reasoning tasks remains an open challenge. Existing multimodal…

Computation and Language · Computer Science 2025-06-13 Caijun Jia , Nan Xu , Jingxuan Wei , Qingli Wang , Lei Wang , Bihui Yu , Junnan Zhu

Chart summarization, which focuses on extracting key information from charts and interpreting it in natural language, is crucial for generating and delivering insights through effective and accessible data analysis. Traditional methods for…

Multimedia · Computer Science 2024-12-31 Peixin Xu , Yujuan Ding , Wenqi Fan

The emergence of Multi-modal Large Language Models (MLLMs) presents new opportunities for chart understanding. However, due to the fine-grained nature of these tasks, applying MLLMs typically requires large, high-quality datasets for…

Computation and Language · Computer Science 2025-10-08 Yifan Wu , Lutao Yan , Leixian Shen , Yinan Mei , Jiannan Wang , Yuyu Luo

Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the…

Machine Learning · Computer Science 2023-09-12 Zhiyuan Zhao , Linke Ouyang , Bin Wang , Siyuan Huang , Pan Zhang , Xiaoyi Dong , Jiaqi Wang , Conghui He

While reinforcement learning (RL) has proven highly effective for general reasoning in vision-language models, its application to tasks requiring deep understanding of information-rich images and structured output generation remains…

Artificial Intelligence · Computer Science 2026-03-17 Lei Chen , Xuanle Zhao , Zhixiong Zeng , Jing Huang , Liming Zheng , Yufeng Zhong , Lin Ma

Although Multimodal Large Language Models (MLLMs) have demonstrated increasingly impressive performance in chart understanding, most of them exhibit alarming hallucinations and significant performance degradation when handling non-annotated…

Computation and Language · Computer Science 2025-12-16 Xiao Zhang , Dongyuan Li , Liuyu Xiang , Yao Zhang , Cheng Zhong , Zhaofeng He

Chart-to-code reconstruction -- the task of recovering executable plotting scripts from chart images -- provides important insights into a model's ability to ground data visualizations in precise, machine-readable form. Yet many existing…

Chart-to-code generation demands strict visual precision and syntactic correctness from Vision-Language Models (VLMs). However, existing approaches are fundamentally constrained by data-centric limitations: despite the availability of…

Computer Vision and Pattern Recognition · Computer Science 2026-04-27 Xiangxi Zheng , Kuang He , Jiayi Hu , Ping Yu , Rui Yan , Yuan Yao , Peng Hou , Anxiang Zeng , Alex Jinpeng Wang

Chart question answering (ChartQA) tasks play a critical role in interpreting and extracting insights from visualization charts. While recent advancements in multimodal large language models (MLLMs) like GPT-4o have shown promise in…

Computation and Language · Computer Science 2024-11-07 Yifan Wu , Lutao Yan , Leixian Shen , Yunhai Wang , Nan Tang , Yuyu Luo

Solving complex chart Q&A tasks requires advanced visual reasoning abilities in multimodal large language models (MLLMs), including recognizing key information from visual inputs and conducting reasoning over it. While fine-tuning MLLMs for…

Computation and Language · Computer Science 2025-09-03 Wei He , Zhiheng Xi , Wanxu Zhao , Xiaoran Fan , Yiwen Ding , Zifei Shan , Tao Gui , Qi Zhang , Xuanjing Huang

Chart understanding is a quintessential information fusion task, requiring the seamless integration of graphical and textual data to extract meaning. The advent of Multimodal Large Language Models (MLLMs) has revolutionized this domain, yet…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Zhihang Yi , Jian Zhao , Jiancheng Lv , Tao Wang

Accurate chart comprehension represents a critical challenge in advancing multimodal learning systems, as extensive information is compressed into structured visual representations. However, existing vision-language models (VLMs) frequently…

Machine Learning · Computer Science 2026-03-10 Xin Zhang , Xingyu Li , Rongguang Wang , Ruizhong Miao , Zheng Wang , Dan Roth , Chenyang Li
‹ Prev 1 2 3 10 Next ›