Related papers: VisRefiner: Learning from Visual Differences for S…

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Automating the transformation of user interface (UI) designs into front-end code holds significant promise for accelerating software development and democratizing design workflows. While multimodal large language models (MLLMs) can…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yilei Jiang , Yaozhi Zheng , Yuxuan Wan , Jiaming Han , Qunzhong Wang , Michael R. Lyu , Xiangyu Yue

pix2code: Generating Code from a Graphical User Interface Screenshot

Transforming a graphical user interface screenshot created by a designer into computer code is a typical task conducted by a developer in order to build customized software, websites, and mobile applications. In this paper, we show that…

Machine Learning · Computer Science 2017-09-20 Tony Beltramelli

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

Image-to-code generation tests whether a vision-language model (VLM) can recover the structure of an image enough to express it as executable code. Existing benchmarks either focus on narrow visual domains, depend on paired executable…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Ajay Vikram Periasami , Junlin Wang , Bhuwan Dhingra

Learning UI-to-Code Reverse Generator Using Visual Critic Without Rendering

Automated reverse engineering of HTML/CSS code from UI screenshots is an important yet challenging problem with broad applications in website development and design. In this paper, we propose a novel vision-code transformer (ViCT) composed…

Computer Vision and Pattern Recognition · Computer Science 2023-11-06 Davit Soselia , Khalid Saifullah , Tianyi Zhou

An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities

Code generation aims to automatically generate code snippets of specific programming language according to natural language descriptions. The continuous advancements in deep learning, particularly pre-trained models, have empowered the code…

Software Engineering · Computer Science 2025-01-24 Zezhou Yang , Sirong Chen , Cuiyun Gao , Zhenhao Li , Xing Hu , Kui Liu , Xin Xia

Automatically Generating Codes from Graphical Screenshots Based on Deep Autocoder

During software front-end development, the work to convert Graphical User Interface(GUI) image to the corresponding front-end code is an inevitable tedious work. There have been some attempts to make this work to be automatic. However, the…

Machine Learning · Computer Science 2020-07-08 Xiaoling Huang , Feng Liao

DOne: Decoupling Structure and Rendering for High-Fidelity Design-to-Code Generation

While Vision Language Models (VLMs) have shown promise in Design-to-Code generation, they suffer from a "holistic bottleneck-failing to reconcile high-level structural hierarchy with fine-grained visual details, often resulting in layout…

Computer Vision and Pattern Recognition · Computer Science 2026-04-03 Xinhao Huang , Jinke Yu , Wenhao Xu , Zeyi Wen , Ying Zhou , Junzhuo Liu , Junhao Ji , Zulong Chen

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Different users find different images generated for the same prompt desirable. This gives rise to personalized image generation which involves creating images aligned with an individual's visual preference. Current generative models are,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-25 Sogand Salehi , Mahdi Shafiei , Teresa Yeo , Roman Bachmann , Amir Zamir

UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization

UI-to-code aims to translate UI screenshots into executable front-end code. Despite progress with vision-language models (VLMs), most existing methods formulate UI-to-code as a single-pass generation, which mismatches real-world UI…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Zhen Yang , Wenyi Hong , Mingde Xu , Xinyue Fan , Weihan Wang , Jiale Cheng , Xiaotao Gu , Jie Tang

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts. Despite these advances,…

Computer Vision and Pattern Recognition · Computer Science 2024-04-24 Xun Wu , Shaohan Huang , Furu Wei

VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis

The notable gap between user-provided and model-preferred prompts poses a significant challenge for generating high-quality images with text-to-image models, compelling the need for prompt engineering. Current studies on prompt engineering…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Shiyu Wu , Mingzhen Sun , Weining Wang , Yequan Wang , Jing Liu

Towards Coding for Human and Machine Vision: A Scalable Image Coding Approach

The past decades have witnessed the rapid development of image and video coding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image/video coding…

Computer Vision and Pattern Recognition · Computer Science 2020-01-13 Yueyu Hu , Shuai Yang , Wenhan Yang , Ling-Yu Duan , Jiaying Liu

Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction

Recently, multimodal large language models (MLLMs) have attracted increasing research attention due to their powerful visual understanding capabilities. While they have achieved impressive results on various vision tasks, their performance…

Computer Vision and Pattern Recognition · Computer Science 2026-03-18 Chengzhi Xu , Yuyang Wang , Lai Wei , Lichao Sun , Weiran Huang

Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots

Automated understanding of user interfaces (UIs) from their pixels can improve accessibility, enable task automation, and facilitate interface design without relying on developers to comprehensively provide metadata. A first step is to…

Human-Computer Interaction · Computer Science 2021-09-21 Jason Wu , Xiaoyi Zhang , Jeff Nichols , Jeffrey P. Bigham

Image Reconstruction as a Tool for Feature Analysis

Vision encoders are increasingly used in modern applications, from vision-only models to multimodal systems such as vision-language models. Despite their remarkable success, it remains unclear how these architectures represent features…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Eduard Allakhverdov , Dmitrii Tarasov , Elizaveta Goncharova , Andrey Kuznetsov

Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images

Recent advances in generative deep learning have enabled the creation of high-quality synthetic images in text-to-image generation. Prior work shows that fine-tuning a pretrained diffusion model on ImageNet and generating synthetic training…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Zhuoran Yu , Chenchen Zhu , Sean Culatana , Raghuraman Krishnamoorthi , Fanyi Xiao , Yong Jae Lee

VinciCoder: Unifying Multimodal Code Generation via Coarse-to-fine Visual Reinforcement Learning

Multimodal code generation has garnered significant interest within the research community. Despite the notable success of recent vision-language models (VLMs) on specialized tasks like chart-to-code generation, their reliance on…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Xuanle Zhao , Deyang Jiang , Zhixiong Zeng , Lei Chen , Haibo Qiu , Jing Huang , Yufeng Zhong , Liming Zheng , Yilin Cao , Lin Ma

Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement

Recent advances in Multimodal Large Language Models (MLLMs) have enabled automated generation of structured layouts from natural language descriptions. Existing methods typically follow a code-only paradigm that generates code to represent…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Junrong Guo , Shancheng Fang , Yadong Qu , Hongtao Xie

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

A major challenge in visually grounded language generation is to build robust benchmark datasets and models that can generalize well in real-world settings. To do this, it is critical to ensure that our evaluation protocols are correct, and…

Computation and Language · Computer Science 2020-10-09 Wanrong Zhu , Xin Eric Wang , Pradyumna Narayana , Kazoo Sone , Sugato Basu , William Yang Wang

CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back

Representing code changes as numeric feature vectors, i.e., code change representations, is usually an essential step to automate many software engineering tasks related to code changes, e.g., commit message generation and just-in-time…

Software Engineering · Computer Science 2023-02-09 Zhongxin Liu , Zhijie Tang , Xin Xia , Xiaohu Yang