English
Related papers

Related papers: VSA:Visual-Structural Alignment for UI-to-Code

200 papers

Visual Question Answering (VQA) attracts much attention from both industry and academia. As a multi-modality task, it is challenging since it requires not only visual and textual understanding, but also the ability to align cross-modality…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Peixi Xiong , Quanzeng You , Pei Yu , Zicheng Liu , Ying Wu

Image-to-image translation has played an important role in enabling synthetic data for computer vision. However, if the source and target domains have a large semantic mismatch, existing techniques often suffer from source content…

Computer Vision and Pattern Recognition · Computer Science 2022-09-07 Justin Theiss , Jay Leverett , Daeil Kim , Aayush Prakash

In the rapidly evolving fields of natural language processing and computer vision, Visual Word Sense Disambiguation (VWSD) stands as a critical, yet challenging task. The quest for models that can seamlessly integrate and interpret…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Aristi Papastavrou , Maria Lymperaiou , Giorgos Stamou

Integrating multimodal foundation models into enterprise ecosystems presents a fundamental software architecture challenge. Architects must balance competing quality attributes: the high latency and non-determinism of vision language action…

Artificial Intelligence · Computer Science 2026-05-01 Habtom Kahsay Gidey , Alexander Lenz , Alois Knoll

Automated front-end engineering drastically reduces development cycles and minimizes manual coding overhead. While Generative AI has shown promise in translating designs to code, current solutions often produce monolithic scripts, failing…

Information Retrieval · Computer Science 2025-12-23 Chong Liu , Ming Zhang , Fei Li , Hao Zhou , Xiaoshuang Chen , Ye Yuan

This article reviews recent progress in the development of the computing framework vector symbolic architectures (VSA) (also known as hyperdimensional computing). This framework is well suited for implementation in stochastic, emerging…

Automating the transformation of user interface (UI) designs into front-end code holds significant promise for accelerating software development and democratizing design workflows. While multimodal large language models (MLLMs) can…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yilei Jiang , Yaozhi Zheng , Yuxuan Wan , Jiaming Han , Qunzhong Wang , Michael R. Lyu , Xiangyu Yue

Design-to-code generation has emerged as a promising approach to bridge the gap between design prototypes and deployable frontend code. However, existing methods often suffer from structural inconsistencies, asset misalignment, and limited…

Software Engineering · Computer Science 2025-11-07 Yongxi Chen , Lei Chen

Jigsaw puzzle solving remains challenging in computer vision, requiring an understanding of both local fragment details and global spatial relationships. While most traditional approaches only focus on visual cues like edge matching and…

Machine Learning · Computer Science 2025-10-01 Zhuoning Xu , Xinyan Liu

Recent studies on Vision-Language-Action (VLA) models have shifted from the end-to-end action-generation paradigm toward a pipeline involving task planning followed by action generation, demonstrating improved performance on various…

Computer Vision and Pattern Recognition · Computer Science 2025-06-24 Chongkai Gao , Zixuan Liu , Zhenghao Chi , Junshan Huang , Xin Fei , Yiwen Hou , Yuxuan Zhang , Yudi Lin , Zhirui Fang , Zeyu Jiang , Lin Shao

Software visualization seeks to represent software artifacts graphical-ly in two or three dimensions, with the goal of enhancing comprehension, anal-ysis, maintenance, and evolution of the source code. In this context, visualiza-tions…

Software Engineering · Computer Science 2025-09-30 Anthony Savidis , Christos Vasilopoulos

In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Mamshad Nayeem Rizve , Fan Fei , Jayakrishnan Unnikrishnan , Son Tran , Benjamin Z. Yao , Belinda Zeng , Mubarak Shah , Trishul Chilimbi

Vector Symbolic Architectures combine a high-dimensional vector space with a set of carefully designed operators in order to perform symbolic computations with large numerical vectors. Major goals are the exploitation of their…

Artificial Intelligence · Computer Science 2021-12-17 Kenny Schlegel , Peer Neubert , Peter Protzel

Vision-Language-Action Models (VLAs) have shown remarkable progress towards embodied intelligence. While their architecture partially resembles that of Large Language Models (LLMs), VLAs exhibit higher complexity due to their multi-modal…

Robotics · Computer Science 2026-03-06 Hugo Buurmeijer , Carmen Amo Alonso , Aiden Swann , Marco Pavone

Vision-language models (VLMs) have shown promise in graph structure understanding, but remain limited by input-token constraints, facing scalability bottlenecks and lacking effective mechanisms to coordinate textual and visual modalities.…

Artificial Intelligence · Computer Science 2026-01-12 Shuo Han , Yukun Cao , Zezhong Ding , Zengyi Gao , S Kevin Zhou , Xike Xie

While Vision-Language-Action (VLA) models show strong promise for generalist robot control, it remains unclear whether -- and under what conditions -- the standard "scale data" recipe translates to robotics, where training data is…

Large Language Model (LLM)-based Automated Program Repair (APR) has shown strong potential on textual benchmarks, yet struggles in multimodal scenarios where bugs are reported with GUI screenshots. Existing methods typically convert images…

Software Engineering · Computer Science 2026-04-10 Zhuoyao Liu , Zhengran Zeng , Shu-Dong Huang , Yang Liu , Shikun Zhang , Wei Ye

Unsupervised domain adaptation for medical image segmentation remains a significant challenge due to substantial domain shifts across imaging modalities, such as CT and MRI. While recent vision-language representation learning methods have…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Lalit Maurya , Honghai Liu , Reyer Zwiggelaar

Vision-Language-Action (VLA) models are a promising paradigm for generalist robotic manipulation by grounding high-level semantic instructions into executable physical actions. However, prevailing approaches typically adopt a monolithic…

Robotics · Computer Science 2026-04-29 Yifei Wei , Linqing Zhong , Yi Liu , Yuxiang Lu , Xindong He , Maoqing Yao , Guanghui Ren

Vision and Language Pretraining has become the prevalent approach for tackling multimodal downstream tasks. The current trend is to move towards ever larger models and pretraining datasets. This computational headlong rush does not seem…

Computer Vision and Pattern Recognition · Computer Science 2022-10-06 Mustafa Shukor , Guillaume Couairon , Matthieu Cord
‹ Prev 1 2 3 10 Next ›