English
Related papers

Related papers: Web2Code: A Large-scale Webpage-to-Code Dataset an…

200 papers

Automatically generating webpage code from webpage designs can significantly reduce the workload of front-end developers, and recent Multimodal Large Language Models (MLLMs) have shown promising potential in this area. However, our…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Yi Gui , Zhen Li , Yao Wan , Yemin Shi , Hongyu Zhang , Yi Su , Bohua Chen , Dongping Chen , Siyuan Wu , Xing Zhou , Wenbin Jiang , Hai Jin , Xiangliang Zhang

Generative AI has made rapid advancements in recent years, achieving unprecedented capabilities in multimodal understanding and code generation. This can enable a new paradigm of front-end development in which multimodal large language…

Computation and Language · Computer Science 2025-02-11 Chenglei Si , Yanzhe Zhang , Ryan Li , Zhengyuan Yang , Ruibo Liu , Diyi Yang

Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development…

Software Engineering · Computer Science 2025-06-13 Fan Wu , Cuiyun Gao , Shuqing Li , Xin-Cheng Wen , Qing Liao

Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance on the design-to-code task, i.e., generating UI code from UI mock-ups. However, existing benchmarks only contain static web pages for evaluation and ignore…

Software Engineering · Computer Science 2026-03-03 Jingyu Xiao , Yuxuan Wan , Yintong Huo , Zixin Wang , Xinyi Xu , Wenxuan Wang , Zhiyao Xu , Yuhang Wang , Michael R. Lyu

We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code editing involving HTML/CSS/JavaScript, and (3) mockup-to-code generation. Unlike prior benchmarks that treat…

With the rapid advancement of Generative AI technology, Multimodal Large Language Models(MLLMs) have the potential to act as AI software engineers capable of executing complex web application development. Considering that the model requires…

Computation and Language · Computer Science 2025-06-10 Zhiyu Lin , Zhengda Zhou , Zhiyuan Zhao , Tianrui Wan , Yilun Ma , Junyu Gao , Xuelong Li

User interface to code (UI2Code) aims to generate executable code that can faithfully reconstruct a given input UI. Prior work focuses largely on web pages and mobile screens, leaving app widgets underexplored. Unlike web or mobile UIs with…

Computer Vision and Pattern Recognition · Computer Science 2026-03-27 Houston H. Zhang , Tao Zhang , Baoze Lin , Yuanqi Xue , Yincheng Zhu , Huan Liu , Li Gu , Linfeng Ye , Ziqiang Wang , Xinxin Zuo , Yang Wang , Yuanhao Yu , Zhixiang Chi

The remarkable progress of Multi-modal Large Language Models (MLLMs) has attracted significant attention due to their superior performance in visual contexts. However, their capabilities in turning visual figure to executable code, have not…

Computation and Language · Computer Science 2024-05-14 Chengyue Wu , Yixiao Ge , Qiushan Guo , Jiahao Wang , Zhixuan Liang , Zeyu Lu , Ying Shan , Ping Luo

We introduce Chart2Code, a new benchmark for evaluating the chart understanding and code generation capabilities of large multimodal models (LMMs). Chart2Code is explicitly designed from a user-driven perspective, capturing diverse…

Software Engineering · Computer Science 2026-04-21 Jiahao Tang , Henry Hengyuan Zhao , Lijian Wu , Zijian Zhang , Yifei Tao , Dongxing Mao , Yang Wan , Jingru Tan , Min Zeng , Min Li , Alex Jinpeng Wang

Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To…

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models…

Computation and Language · Computer Science 2024-09-27 Kaixin Li , Yuchen Tian , Qisheng Hu , Ziyang Luo , Zhiyong Huang , Jing Ma

Text-rich visual understanding-the ability to process environments where dense textual content is integrated with visuals-is crucial for multimodal large language models (MLLMs) to interact effectively with structured environments. To…

Computer Vision and Pattern Recognition · Computer Science 2024-11-07 Junpeng Liu , Tianyue Ou , Yifan Song , Yuxiao Qu , Wai Lam , Chenyan Xiong , Wenhu Chen , Graham Neubig , Xiang Yue

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in chart understanding tasks. However, interpreting charts with textual descriptions often leads to information loss, as it fails to fully capture the dense…

Artificial Intelligence · Computer Science 2025-07-03 Xuanle Zhao , Xianzhen Luo , Qi Shi , Chi Chen , Shuo Wang , Zhiyuan Liu , Maosong Sun

The Design2Code problem, which involves converting digital designs into functional source code, is a significant challenge in software development due to its complexity and time-consuming nature. Traditional approaches often struggle with…

Machine Learning · Computer Science 2025-04-29 Tung D. Vu , Chung Hoang , Truong-Son Hy

Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding. However, their ability to generate code from multimodal inputs remains limited. In this work, we introduce VisCodex, a…

Computation and Language · Computer Science 2025-08-14 Lingjie Jiang , Shaohan Huang , Xun Wu , Yixia Li , Dongdong Zhang , Furu Wei

Multimodal Large Language models (MLLMs) have shown promise in web-related tasks, but evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks. Existing benchmarks are either designed…

Computation and Language · Computer Science 2024-04-10 Junpeng Liu , Yifan Song , Bill Yuchen Lin , Wai Lam , Graham Neubig , Yuanzhi Li , Xiang Yue

Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for…

Human-Computer Interaction · Computer Science 2024-03-15 Hugo Laurençon , Léo Tronchon , Victor Sanh

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra

While large language models (LLMs) show promise in code generation, existing benchmarks neglect the flowchart-based code generation. To promote further research on flowchart-based code generation, this work presents Flow2Code, a novel…

Software Engineering · Computer Science 2025-06-04 Mengliang He , Jiayi Zeng , Yankai Jiang , Wei Zhang , Zeming Liu , Xiaoming Shi , Aimin Zhou

Front-end engineering involves a complex workflow where engineers conceptualize designs, translate them into code, and iteratively refine the implementation. While recent benchmarks primarily focus on converting visual designs to code, we…

Computation and Language · Computer Science 2025-05-27 Haoyu Sun , Huichen Will Wang , Jiawei Gu , Linjie Li , Yu Cheng
‹ Prev 1 2 3 10 Next ›