Related papers: PlotGen-Bench: Evaluating VLMs on Generating Visua…

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To…

Computation and Language · Computer Science 2026-03-30 Jiajun Zhang , Yuying Li , Zhixun Li , Xingyu Guo , Jingzhuo Wu , Leqi Zheng , Yiran Yang , Jianke Zhang , Qingbin Li , Shannan Yan , Zhetong Li , Changguo Jia , Junfei Wu , Zilei Wang , Qiang Liu , Liang Wang

PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback

Scientific data visualization is pivotal for transforming raw data into comprehensible visual representations, enabling pattern recognition, forecasting, and the presentation of data-driven insights. However, novice users often face…

Computation and Language · Computer Science 2025-02-04 Kanika Goswami , Puneet Mathur , Ryan Rossi , Franck Dernoncourt

VCG-Bench: Towards A Unified Visual-Centric Benchmark for Structured Generation and Editing

Despite the rapid advancements in Vision-Language Models (VLMs), a critical gap remains in their ability to handle structured, controllable diagrammatic tasks essential for professional workflows. Existing methods predominantly rely on…

Computation and Language · Computer Science 2026-05-18 Xiaoyan Su , Peijie Dong , Zhenheng Tang , Song Tang , Yuyao Zhai , Kaitao Lin , Liang Chen , Gai Yuhang , Yuyu Luo , Qiang Wang , Xiaowen Chu

Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code

This paper introduces the human-curated PandasPlotBench dataset, designed to evaluate language models' effectiveness as assistants in visual data exploration. Our benchmark focuses on generating code for visualizing tabular data - such as a…

Software Engineering · Computer Science 2025-02-27 Timur Galimzyanov , Sergey Titov , Yaroslav Golubev , Egor Bogomolov

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

The remarkable progress of Multi-modal Large Language Models (MLLMs) has attracted significant attention due to their superior performance in visual contexts. However, their capabilities in turning visual figure to executable code, have not…

Computation and Language · Computer Science 2024-05-14 Chengyue Wu , Yixiao Ge , Qiushan Guo , Jiahao Wang , Zhixuan Liang , Zeyu Lu , Ying Shan , Ping Luo

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra

VisCoder2: Building Multi-Language Visualization Coding Agents

Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable…

Software Engineering · Computer Science 2026-04-09 Yuansheng Ni , Songcheng Cai , Xiangchao Chen , Jiarong Liang , Zhiheng Lyu , Jiaqi Deng , Kai Zou , Ping Nie , Fei Yuan , Xiang Yue , Wenhu Chen

ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation

Chart-to-code reconstruction -- the task of recovering executable plotting scripts from chart images -- provides important insights into a model's ability to ground data visualizations in precise, machine-readable form. Yet many existing…

Human-Computer Interaction · Computer Science 2025-07-29 Jovana Kondic , Pengyuan Li , Dhiraj Joshi , Zexue He , Shafiq Abedin , Jennifer Sun , Ben Wiesel , Eli Schwartz , Ahmed Nassar , Bo Wu , Assaf Arbelle , Aude Oliva , Dan Gutfreund , Leonid Karlinsky , Rogerio Feris

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding. However, their ability to generate code from multimodal inputs remains limited. In this work, we introduce VisCodex, a…

Computation and Language · Computer Science 2025-08-14 Lingjie Jiang , Shaohan Huang , Xun Wu , Yixia Li , Dongdong Zhang , Furu Wei

VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

In the realm of vision models, the primary mode of representation is using pixels to rasterize the visual world. Yet this is not always the best or unique way to represent visual content, especially for designers and artists who depict the…

Computer Vision and Pattern Recognition · Computer Science 2024-08-30 Bocheng Zou , Mu Cai , Jianrui Zhang , Yong Jae Lee

VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

Vision-Language Models (VLMs) have achieved impressive performance in cross-modal understanding across textual and visual inputs, yet existing benchmarks predominantly focus on pure-text queries. In real-world scenarios, language also…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Qing'an Liu , Juntong Feng , Yuhao Wang , Xinzhe Han , Yujie Cheng , Yue Zhu , Haiwen Diao , Yunzhi Zhuge , Huchuan Lu

PlotCraft: Pushing the Limits of LLMs for Complex and Interactive Data Visualization

Recent Large Language Models (LLMs) have demonstrated remarkable proficiency in code generation. However, their ability to create complex visualizations for scaled and structured data remains largely unevaluated and underdeveloped. To…

Computation and Language · Computer Science 2026-01-16 Jiajun Zhang , Jianke Zhang , Zeyu Cui , Jiaxi Yang , Lei Zhang , Binyuan Hui , Qiang Liu , Zilei Wang , Liang Wang , Junyang Lin

Are LLMs ready for Visualization?

Generative models have received a lot of attention in many areas of academia and the industry. Their capabilities span many areas, from the invention of images given a prompt to the generation of concrete code to solve a certain programming…

Human-Computer Interaction · Computer Science 2024-03-12 Pere-Pau Vázquez

Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities

This paper introduces Code-Vision, a benchmark designed to evaluate the logical understanding and code generation capabilities of Multimodal Large Language Models (MLLMs). It challenges MLLMs to generate a correct program that fulfills…

Computation and Language · Computer Science 2025-02-18 Hanbin Wang , Xiaoxuan Zhou , Zhipeng Xu , Keyuan Cheng , Yuxin Zuo , Kai Tian , Jingwei Song , Junting Lu , Wenhui Hu , Xueyang Liu

AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, with code generation emerging as a key area of focus. While numerous benchmarks have been proposed to evaluate their code generation abilities,…

Computation and Language · Computer Science 2025-08-13 Jason Chou , Ao Liu , Yuchi Deng , Zhiying Zeng , Tao Zhang , Haotian Zhu , Jianwei Cai , Yue Mao , Chenchen Zhang , Lingyun Tan , Ziyan Xu , Bohui Zhai , Hengyi Liu , Speed Zhu , Wiggin Zhou , Fengzong Lian

VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models

Although large visual-language models (LVLMs) have demonstrated strong performance in multimodal tasks, errors may occasionally arise due to biases during the reasoning process. Recently, reward models (RMs) have become increasingly pivotal…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Jiacheng Ruan , Wenzhen Yuan , Xian Gao , Ye Guo , Daoxin Zhang , Zhe Xu , Yao Hu , Ting Liu , Yuzhuo Fu

DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in automated front-end engineering, e.g., generating UI code from visual designs. However, existing front-end UI code generation benchmarks have the…

Software Engineering · Computer Science 2026-03-17 Jingyu Xiao , Ming Wang , Man Ho Lam , Yuxuan Wan , Junliang Liu , Yintong Huo , Michael R. Lyu

V-GameGym: Visual Game Generation for Code Large Language Models

Code large language models have demonstrated remarkable capabilities in programming tasks, yet current benchmarks primarily focus on single modality rather than visual game development. Most existing code-related benchmarks evaluate syntax…

Software Engineering · Computer Science 2025-09-25 Wei Zhang , Jack Yang , Renshuai Tao , Lingzheng Chai , Shawn Guo , Jiajun Wu , Xiaoming Chen , Ganqu Cui , Ning Ding , Xander Xu , Hu Wei , Bowen Zhou

UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

Significant research efforts have been made to scale and improve vision-language model (VLM) training approaches. Yet, with an ever-growing number of benchmarks, researchers are tasked with the heavy burden of implementing each protocol,…

Computer Vision and Pattern Recognition · Computer Science 2024-08-12 Haider Al-Tahan , Quentin Garrido , Randall Balestriero , Diane Bouchacourt , Caner Hazirbas , Mark Ibrahim

VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models

Multimodal large language models (MLLMs) have enabled a wide range of advanced vision-language applications, including fine-grained object recognition and contextual understanding. When querying specific regions or objects in an image,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Mingjie Xu , Jinpeng Chen , Yuzhi Zhao , Jason Chun Lok Li , Yue Qiu , Zekang Du , Mengyang Wu , Pingping Zhang , Kun Li , Hongzheng Yang , Wenao Ma , Jiaheng Wei , Qinbin Li , Kangcheng Liu , Wenqiang Lei