Related papers: Widget2Code: From Visual Widgets to UI Code via Mu…

MLLM-Based UI2Code Automation Guided by UI Layout Information

Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development…

Software Engineering · Computer Science 2025-06-13 Fan Wu , Cuiyun Gao , Shuqing Li , Xin-Cheng Wen , Qing Liao

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Sukmin Yun , Haokun Lin , Rusiru Thushara , Mohammad Qazim Bhat , Yongxin Wang , Zutao Jiang , Mingkai Deng , Jinhong Wang , Tianhua Tao , Junbo Li , Haonan Li , Preslav Nakov , Timothy Baldwin , Zhengzhong Liu , Eric P. Xing , Xiaodan Liang , Zhiqiang Shen

Prototype2Code: End-to-end Front-end Code Generation from UI Design Prototypes

UI-to-code technology has streamlined the front-end development process, reducing repetitive tasks for engineers. prior research mainly use design prototypes as inputs, with the effectiveness of the generated code heavily dependent on these…

Software Engineering · Computer Science 2024-05-09 Shuhong Xiao , Yunnong Chen , Jiazhi Li , Liuqing Chen , Lingyun Sun , Tingting Zhou

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra

Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping

Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance on the design-to-code task, i.e., generating UI code from UI mock-ups. However, existing benchmarks only contain static web pages for evaluation and ignore…

Software Engineering · Computer Science 2026-03-03 Jingyu Xiao , Yuxuan Wan , Yintong Huo , Zixin Wang , Xinyi Xu , Wenxuan Wang , Zhiyao Xu , Yuhang Wang , Michael R. Lyu

UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization

UI-to-code aims to translate UI screenshots into executable front-end code. Despite progress with vision-language models (VLMs), most existing methods formulate UI-to-code as a single-pass generation, which mismatches real-world UI…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Zhen Yang , Wenyi Hong , Mingde Xu , Xinyue Fan , Weihan Wang , Jiale Cheng , Xiaotao Gu , Jie Tang

Figma2Code: Automating Multimodal Design to Code in the Wild

Front-end development constitutes a substantial portion of software engineering, yet converting design mockups into production-ready User Interface (UI) code remains tedious and costly. While recent work has explored automating this process…

Software Engineering · Computer Science 2026-04-16 Yi Gui , Jiawan Zhang , Yina Wang , Tianran Ma , Yao Wan , Shilin He , Dongping Chen , Zhou Zhao , Wenbin Jiang , Xuanhua Shi , Hai Jin , Philip S Yu

Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping

Sketches are a natural and accessible medium for UI designers to conceptualize early-stage ideas. However, existing research on UI/UX automation often requires high-fidelity inputs like Figma designs or detailed screenshots, limiting…

Computation and Language · Computer Science 2024-10-22 Ryan Li , Yanzhe Zhang , Diyi Yang

Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Generative AI has made rapid advancements in recent years, achieving unprecedented capabilities in multimodal understanding and code generation. This can enable a new paradigm of front-end development in which multimodal large language…

Computation and Language · Computer Science 2025-02-11 Chenglei Si , Yanzhe Zhang , Ryan Li , Zhengyuan Yang , Ruibo Liu , Diyi Yang

UniCode$^2$: Cascaded Large-scale Codebooks for Unified Multimodal Understanding and Generation

Unified multimodal large language models (MLLMs) have shown promise in jointly advancing multimodal understanding and generation, with visual codebooks discretizing images into tokens for autoregressive modeling. Existing codebook-based…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Yanzhe Chen , Huasong Zhong , Yan Li , Zhenheng Yang

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Automating the transformation of user interface (UI) designs into front-end code holds significant promise for accelerating software development and democratizing design workflows. While multimodal large language models (MLLMs) can…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yilei Jiang , Yaozhi Zheng , Yuxuan Wan , Jiaming Han , Qunzhong Wang , Michael R. Lyu , Xiangyu Yue

From Charts to Code: A Hierarchical Benchmark for Multimodal Models

We introduce Chart2Code, a new benchmark for evaluating the chart understanding and code generation capabilities of large multimodal models (LMMs). Chart2Code is explicitly designed from a user-driven perspective, capturing diverse…

Software Engineering · Computer Science 2026-04-21 Jiahao Tang , Henry Hengyuan Zhao , Lijian Wu , Zijian Zhang , Yifei Tao , Dongxing Mao , Yang Wan , Jingru Tan , Min Zeng , Min Li , Alex Jinpeng Wang

RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To…

Computation and Language · Computer Science 2026-03-30 Jiajun Zhang , Yuying Li , Zhixun Li , Xingyu Guo , Jingzhuo Wu , Leqi Zheng , Yiran Yang , Jianke Zhang , Qingbin Li , Shannan Yan , Zhetong Li , Changguo Jia , Junfei Wu , Zilei Wang , Qiang Liu , Liang Wang

WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs

Automatically generating webpage code from webpage designs can significantly reduce the workload of front-end developers, and recent Multimodal Large Language Models (MLLMs) have shown promising potential in this area. However, our…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Yi Gui , Zhen Li , Yao Wan , Yemin Shi , Hongyu Zhang , Yi Su , Bohua Chen , Dongping Chen , Siyuan Wu , Xing Zhou , Wenbin Jiang , Hai Jin , Xiangliang Zhang

Code2World: A GUI World Model via Renderable Code Generation

Autonomous GUI agents interact with environments by perceiving interfaces and executing actions. As a virtual sandbox, the GUI World model empowers agents with human-like foresight by enabling action-conditioned prediction. However,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-11 Yuhao Zheng , Li'an Zhong , Yi Wang , Rui Dai , Kaikui Liu , Xiangxiang Chu , Linyuan Lv , Philip Torr , Kevin Qinghong Lin

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only…

Software Engineering · Computer Science 2025-11-11 Mingde Xu , Zhen Yang , Wenyi Hong , Lihang Pan , Xinyue Fan , Yan Wang , Xiaotao Gu , Bin Xu , Jie Tang

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

The remarkable progress of Multi-modal Large Language Models (MLLMs) has attracted significant attention due to their superior performance in visual contexts. However, their capabilities in turning visual figure to executable code, have not…

Computation and Language · Computer Science 2024-05-14 Chengyue Wu , Yixiao Ge , Qiushan Guo , Jiahao Wang , Zhixuan Liang , Zeyu Lu , Ying Shan , Ping Luo

World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering

Recent advances in Vision-Language Models (VLMs) and the scarcity of high-quality multi-modal alignment data have inspired numerous researches on synthetic VLM data generation. The conventional norm in VLM data construction uses a mixture…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Jiacong Wang , Bohong Wu , Haiyong Jiang , Xun Zhou , Xin Xiao , Haoyuan Guo , Jun Xiao

ComUICoder: Component-based Reusable UI Code Generation for Complex Websites via Semantic Segmentation and Element-wise Feedback

Multimodal Large Language Models (MLLMs) have demonstrated strong performance on the UI-to-code task, which aims to generate UI code from design mock-ups. However, when applied to long and complex websites, they often struggle with…

Software Engineering · Computer Science 2026-02-24 Jingyu Xiao , Jiantong Qin , Shuoqi Li , Man Ho Lam , Yuxuan Wan , Jen-tse Huang , Yintong Huo , Michael R. Lyu

MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models…

Computation and Language · Computer Science 2024-09-27 Kaixin Li , Yuchen Tian , Qisheng Hu , Ziyang Luo , Zhiyong Huang , Jing Ma