Related papers: Plot2Code: A Comprehensive Benchmark for Evaluatin…

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Generative AI has made rapid advancements in recent years, achieving unprecedented capabilities in multimodal understanding and code generation. This can enable a new paradigm of front-end development in which multimodal large language…

Computation and Language · Computer Science 2025-02-11 Chenglei Si , Yanzhe Zhang , Ryan Li , Zhengyuan Yang , Ruibo Liu , Diyi Yang

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Sukmin Yun , Haokun Lin , Rusiru Thushara , Mohammad Qazim Bhat , Yongxin Wang , Zutao Jiang , Mingkai Deng , Jinhong Wang , Tianhua Tao , Junbo Li , Haonan Li , Preslav Nakov , Timothy Baldwin , Zhengzhong Liu , Eric P. Xing , Xiaodan Liang , Zhiqiang Shen

Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text

Automated data visualization plays a crucial role in simplifying data interpretation, enhancing decision-making, and improving efficiency. While large language models (LLMs) have shown promise in generating visualizations from natural…

Computation and Language · Computer Science 2025-07-29 Mizanur Rahman , Md Tahmid Rahman Laskar , Shafiq Joty , Enamul Hoque

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To…

Computation and Language · Computer Science 2026-03-30 Jiajun Zhang , Yuying Li , Zhixun Li , Xingyu Guo , Jingzhuo Wu , Leqi Zheng , Yiran Yang , Jianke Zhang , Qingbin Li , Shannan Yan , Zhetong Li , Changguo Jia , Junfei Wu , Zilei Wang , Qiang Liu , Liang Wang

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered unparalleled attention, due to their superior performance in visual contexts. However, their capabilities in visual math problem-solving remain insufficiently…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Renrui Zhang , Dongzhi Jiang , Yichi Zhang , Haokun Lin , Ziyu Guo , Pengshuo Qiu , Aojun Zhou , Pan Lu , Kai-Wei Chang , Peng Gao , Hongsheng Li

VisCoder2: Building Multi-Language Visualization Coding Agents

Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable…

Software Engineering · Computer Science 2026-04-09 Yuansheng Ni , Songcheng Cai , Xiangchao Chen , Jiarong Liang , Zhiheng Lyu , Jiaqi Deng , Kai Zou , Ping Nie , Fei Yuan , Xiang Yue , Wenhu Chen

Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities

This paper introduces Code-Vision, a benchmark designed to evaluate the logical understanding and code generation capabilities of Multimodal Large Language Models (MLLMs). It challenges MLLMs to generate a correct program that fulfills…

Computation and Language · Computer Science 2025-02-18 Hanbin Wang , Xiaoxuan Zhou , Zhipeng Xu , Keyuan Cheng , Yuxin Zuo , Kai Tian , Jingwei Song , Junting Lu , Wenhui Hu , Xueyang Liu

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning

Recent progress in Multi-modal Large Language Models (MLLMs) has enabled step-by-step multi-modal mathematical reasoning by performing visual operations based on the textual instructions. A promising approach uses code as an intermediate…

Computation and Language · Computer Science 2025-11-06 Xiaoyuan Li , Moxin Li , Wenjie Wang , Rui Men , Yichang Zhang , Fuli Feng , Dayiheng Liu

From Charts to Code: A Hierarchical Benchmark for Multimodal Models

We introduce Chart2Code, a new benchmark for evaluating the chart understanding and code generation capabilities of large multimodal models (LMMs). Chart2Code is explicitly designed from a user-driven perspective, capturing diverse…

Software Engineering · Computer Science 2026-04-21 Jiahao Tang , Henry Hengyuan Zhao , Lijian Wu , Zijian Zhang , Yifei Tao , Dongxing Mao , Yang Wan , Jingru Tan , Min Zeng , Min Li , Alex Jinpeng Wang

PlotGen-Bench: Evaluating VLMs on Generating Visualization Code from Diverse Plots across Multiple Libraries

Recent advances in vision-language models (VLMs) have expanded their multimodal code generation capabilities, yet their ability to generate executable visualization code from plots, especially for complex 3D, animated, plot-to-plot…

Human-Computer Interaction · Computer Science 2026-01-21 Yi Zhao , Zhen Yang , Shuaiqi Duan , Wenmeng Yu , Zhe Su , Jibing Gong , Jie Tang

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra

MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models…

Computation and Language · Computer Science 2024-09-27 Kaixin Li , Yuchen Tian , Qisheng Hu , Ziyang Luo , Zhiyong Huang , Jing Ma

PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation

The ability of large language models (LLMs) to interpret visual representations of data is crucial for advancing their application in data analysis and decision-making processes. This paper presents a novel synthetic dataset designed to…

Computation and Language · Computer Science 2024-09-05 Aneta Pawelec , Victoria Sara Wesołowska , Zuzanna Bączek , Piotr Sankowski

VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models

Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding. However, their ability to generate code from multimodal inputs remains limited. In this work, we introduce VisCodex, a…

Computation and Language · Computer Science 2025-08-14 Lingjie Jiang , Shaohan Huang , Xun Wu , Yixia Li , Dongdong Zhang , Furu Wei

Flow2Code: Evaluating Large Language Models for Flowchart-based Code Generation Capability

While large language models (LLMs) show promise in code generation, existing benchmarks neglect the flowchart-based code generation. To promote further research on flowchart-based code generation, this work presents Flow2Code, a novel…

Software Engineering · Computer Science 2025-06-04 Mengliang He , Jiayi Zeng , Yankai Jiang , Wei Zhang , Zeming Liu , Xiaoming Shi , Aimin Zhou

Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation

We present Omni-I2C, a comprehensive benchmark designed to evaluate the capability of Large Multimodal Models (LMMs) in converting complex, structured digital graphics into executable code. We argue that this task represents a non-trivial…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Jiawei Zhou , Chi Zhang , Xiang Feng , Qiming Zhang , Haibo Qiu , Lihuo He , Dengpan Ye , Xinbo Gao , Jing Zhang

SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

Comprehending text-rich visual content is paramount for the practical application of Multimodal Large Language Models (MLLMs), since text-rich scenarios are ubiquitous in the real world, which are characterized by the presence of extensive…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Bohao Li , Yuying Ge , Yi Chen , Yixiao Ge , Ruimao Zhang , Ying Shan

DesignProbe: A Graphic Design Benchmark for Multimodal Large Language Models

A well-executed graphic design typically achieves harmony in two levels, from the fine-grained design elements (color, font and layout) to the overall design. This complexity makes the comprehension of graphic design challenging, for it…

Computer Vision and Pattern Recognition · Computer Science 2024-04-24 Jieru Lin , Danqing Huang , Tiejun Zhao , Dechen Zhan , Chin-Yew Lin

Cross-Modal Consistency in Multimodal Large Language Models

Recent developments in multimodal methodologies have marked the beginning of an exciting era for models adept at processing diverse data types, encompassing text, audio, and visual content. Models like GPT-4V, which merge computer vision…

Computation and Language · Computer Science 2024-11-15 Xiang Zhang , Senyu Li , Ning Shi , Bradley Hauer , Zijun Wu , Grzegorz Kondrak , Muhammad Abdul-Mageed , Laks V. S. Lakshmanan

SEED-Bench-2: Benchmarking Multimodal Large Language Models

Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Bohao Li , Yuying Ge , Yixiao Ge , Guangzhi Wang , Rui Wang , Ruimao Zhang , Ying Shan