English
Related papers

Related papers: Plot2Code: A Comprehensive Benchmark for Evaluatin…

200 papers

Generative AI has made rapid advancements in recent years, achieving unprecedented capabilities in multimodal understanding and code generation. This can enable a new paradigm of front-end development in which multimodal large language…

Computation and Language · Computer Science 2025-02-11 Chenglei Si , Yanzhe Zhang , Ryan Li , Zhengyuan Yang , Ruibo Liu , Diyi Yang

Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding…

Automated data visualization plays a crucial role in simplifying data interpretation, enhancing decision-making, and improving efficiency. While large language models (LLMs) have shown promise in generating visualizations from natural…

Computation and Language · Computer Science 2025-07-29 Mizanur Rahman , Md Tahmid Rahman Laskar , Shafiq Joty , Enamul Hoque

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To…

The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered unparalleled attention, due to their superior performance in visual contexts. However, their capabilities in visual math problem-solving remain insufficiently…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Renrui Zhang , Dongzhi Jiang , Yichi Zhang , Haokun Lin , Ziyu Guo , Pengshuo Qiu , Aojun Zhou , Pan Lu , Kai-Wei Chang , Peng Gao , Hongsheng Li

Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable…

Software Engineering · Computer Science 2026-04-09 Yuansheng Ni , Songcheng Cai , Xiangchao Chen , Jiarong Liang , Zhiheng Lyu , Jiaqi Deng , Kai Zou , Ping Nie , Fei Yuan , Xiang Yue , Wenhu Chen

This paper introduces Code-Vision, a benchmark designed to evaluate the logical understanding and code generation capabilities of Multimodal Large Language Models (MLLMs). It challenges MLLMs to generate a correct program that fulfills…

Computation and Language · Computer Science 2025-02-18 Hanbin Wang , Xiaoxuan Zhou , Zhipeng Xu , Keyuan Cheng , Yuxin Zuo , Kai Tian , Jingwei Song , Junting Lu , Wenhui Hu , Xueyang Liu

Recent progress in Multi-modal Large Language Models (MLLMs) has enabled step-by-step multi-modal mathematical reasoning by performing visual operations based on the textual instructions. A promising approach uses code as an intermediate…

Computation and Language · Computer Science 2025-11-06 Xiaoyuan Li , Moxin Li , Wenjie Wang , Rui Men , Yichang Zhang , Fuli Feng , Dayiheng Liu

We introduce Chart2Code, a new benchmark for evaluating the chart understanding and code generation capabilities of large multimodal models (LMMs). Chart2Code is explicitly designed from a user-driven perspective, capturing diverse…

Software Engineering · Computer Science 2026-04-21 Jiahao Tang , Henry Hengyuan Zhao , Lijian Wu , Zijian Zhang , Yifei Tao , Dongxing Mao , Yang Wan , Jingru Tan , Min Zeng , Min Li , Alex Jinpeng Wang

Recent advances in vision-language models (VLMs) have expanded their multimodal code generation capabilities, yet their ability to generate executable visualization code from plots, especially for complex 3D, animated, plot-to-plot…

Human-Computer Interaction · Computer Science 2026-01-21 Yi Zhao , Zhen Yang , Shuaiqi Duan , Wenmeng Yu , Zhe Su , Jibing Gong , Jie Tang

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models…

Computation and Language · Computer Science 2024-09-27 Kaixin Li , Yuchen Tian , Qisheng Hu , Ziyang Luo , Zhiyong Huang , Jing Ma

The ability of large language models (LLMs) to interpret visual representations of data is crucial for advancing their application in data analysis and decision-making processes. This paper presents a novel synthetic dataset designed to…

Computation and Language · Computer Science 2024-09-05 Aneta Pawelec , Victoria Sara Wesołowska , Zuzanna Bączek , Piotr Sankowski

Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding. However, their ability to generate code from multimodal inputs remains limited. In this work, we introduce VisCodex, a…

Computation and Language · Computer Science 2025-08-14 Lingjie Jiang , Shaohan Huang , Xun Wu , Yixia Li , Dongdong Zhang , Furu Wei

While large language models (LLMs) show promise in code generation, existing benchmarks neglect the flowchart-based code generation. To promote further research on flowchart-based code generation, this work presents Flow2Code, a novel…

Software Engineering · Computer Science 2025-06-04 Mengliang He , Jiayi Zeng , Yankai Jiang , Wei Zhang , Zeming Liu , Xiaoming Shi , Aimin Zhou

We present Omni-I2C, a comprehensive benchmark designed to evaluate the capability of Large Multimodal Models (LMMs) in converting complex, structured digital graphics into executable code. We argue that this task represents a non-trivial…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Jiawei Zhou , Chi Zhang , Xiang Feng , Qiming Zhang , Haibo Qiu , Lihuo He , Dengpan Ye , Xinbo Gao , Jing Zhang

Comprehending text-rich visual content is paramount for the practical application of Multimodal Large Language Models (MLLMs), since text-rich scenarios are ubiquitous in the real world, which are characterized by the presence of extensive…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Bohao Li , Yuying Ge , Yi Chen , Yixiao Ge , Ruimao Zhang , Ying Shan

A well-executed graphic design typically achieves harmony in two levels, from the fine-grained design elements (color, font and layout) to the overall design. This complexity makes the comprehension of graphic design challenging, for it…

Computer Vision and Pattern Recognition · Computer Science 2024-04-24 Jieru Lin , Danqing Huang , Tiejun Zhao , Dechen Zhan , Chin-Yew Lin

Recent developments in multimodal methodologies have marked the beginning of an exciting era for models adept at processing diverse data types, encompassing text, audio, and visual content. Models like GPT-4V, which merge computer vision…

Computation and Language · Computer Science 2024-11-15 Xiang Zhang , Senyu Li , Ning Shi , Bradley Hauer , Zijun Wu , Grzegorz Kondrak , Muhammad Abdul-Mageed , Laks V. S. Lakshmanan

Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Bohao Li , Yuying Ge , Yixiao Ge , Guangzhi Wang , Rui Wang , Ruimao Zhang , Ying Shan
‹ Prev 1 2 3 10 Next ›