English

VSA:Visual-Structural Alignment for UI-to-Code

Information Retrieval 2025-12-24 v1

Abstract

The automation of user interface development has the potential to accelerate software delivery by mitigating intensive manual implementation. Despite the advancements in Large Multimodal Models for design-to-code translation, existing methodologies predominantly yield unstructured, flat codebases that lack compatibility with component-oriented libraries such as React or Angular. Such outputs typically exhibit low cohesion and high coupling, complicating long-term maintenance. In this paper, we propose \textbf{VSA (VSA)}, a multi-stage paradigm designed to synthesize organized frontend assets through visual-structural alignment. Our approach first employs a spatial-aware transformer to reconstruct the visual input into a hierarchical tree representation. Moving beyond basic layout extraction, we integrate an algorithmic pattern-matching layer to identify recurring UI motifs and encapsulate them into modular templates. These templates are then processed via a schema-driven synthesis engine, ensuring the Large Language Model generates type-safe, prop-drilled components suitable for production environments. Experimental results indicate that our framework yields a substantial improvement in code modularity and architectural consistency over state-of-the-art benchmarks, effectively bridging the gap between raw pixels and scalable software engineering.

Keywords

Cite

@article{arxiv.2512.20034,
  title  = {VSA:Visual-Structural Alignment for UI-to-Code},
  author = {Xian Wu and Ming Zhang and Zhiyu Fang and Fei Li and Bin Wang and Yong Jiang and Hao Zhou},
  journal= {arXiv preprint arXiv:2512.20034},
  year   = {2025}
}
R2 v1 2026-07-01T08:37:59.954Z