Related papers: Generative Visual Code Mobile World Models

Code2World: A GUI World Model via Renderable Code Generation

Autonomous GUI agents interact with environments by perceiving interfaces and executing actions. As a virtual sandbox, the GUI World model empowers agents with human-like foresight by enabling action-conditioned prediction. However,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-11 Yuhao Zheng , Li'an Zhong , Yi Wang , Rui Dai , Kaikui Liu , Xiangxiang Chu , Linyuan Lv , Philip Torr , Kevin Qinghong Lin

MobileWorldBench: Towards Semantic World Modeling For Mobile Agents

World models have shown great utility in improving the task performance of embodied agents. While prior work largely focuses on pixel-space world models, these approaches face practical limitations in GUI settings, where predicting complex…

Artificial Intelligence · Computer Science 2025-12-17 Shufan Li , Konstantinos Kallidromitis , Akash Gokul , Yusuke Kato , Kazuki Kozuka , Aditya Grover

How Mobile World Model Guides GUI Agents?

Recent advances in vision-language models have enabled mobile GUI agents to perceive visual interfaces and execute user instructions, but reliable prediction of action consequences remains critical for long-horizon and high-risk…

Artificial Intelligence · Computer Science 2026-05-25 Weikai Xu , Kun Huang , Yunren Feng , Jiaxing Li , Yuhan Chen , Yuxuan Liu , Zhizheng Jiang , Heng Qu , Pengzhi Gao , Wei Liu , Jian Luan , Xiaolin Hu , Bo An

ViMo: A Generative Visual GUI World Model for App Agents

App agents, which autonomously operate mobile Apps through Graphical User Interfaces (GUIs), have gained significant interest in real-world applications. Yet, they often struggle with long-horizon planning, failing to find the optimal…

Human-Computer Interaction · Computer Science 2025-05-21 Dezhao Luo , Bohan Tang , Kang Li , Georgios Papoudakis , Jifei Song , Shaogang Gong , Jianye Hao , Jun Wang , Kun Shao

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

Recently, Multimodal Large Language Models (MLLMs) have been used as agents to control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) and generating corresponding commands. However, current agents…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Dongping Chen , Yue Huang , Siyuan Wu , Jingyu Tang , Liuyi Chen , Yilin Bai , Zhigang He , Chenlong Wang , Huichi Zhou , Yiqiang Li , Tianshuo Zhou , Yue Yu , Chujie Gao , Qihui Zhang , Yi Gui , Zhen Li , Yao Wan , Pan Zhou , Jianfeng Gao , Lichao Sun

Distilling Game Code World Model Generation into Lightweight Large Language Models

Large Language Models (LLMs) have shown great ability in generating executable code from natural language, opening the possibility of automatically constructing environments for AI agents. Recent work on Code World Models (CWMs)…

Artificial Intelligence · Computer Science 2026-05-26 Tyrone Serapio , Arjun Prakash , Haoyang Xu , Kevin Wang , Amy Greenwald

Coding the Visual World: From Image to Simulation Using Vision Language Models

The ability to construct mental models of the world is a central aspect of understanding. Similarly, visual understanding can be viewed as the ability to construct a representative model of the system depicted in an image. This work…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Sagi Eppel

World-in-World: World Models in a Closed-Loop World

Generative world models (WMs) can now simulate worlds with striking visual realism, which naturally raises the question of whether they can endow embodied agents with predictive perception for decision making. Progress on this question has…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Jiahan Zhang , Muqing Jiang , Nanru Dai , Taiming Lu , Arda Uzunoglu , Shunchi Zhang , Yana Wei , Jiahao Wang , Vishal M. Patel , Paul Pu Liang , Daniel Khashabi , Cheng Peng , Rama Chellappa , Tianmin Shu , Alan Yuille , Yilun Du , Jieneng Chen

CWM: An Open-Weights LLM for Research on Code Generation with World Models

We release Code World Model (CWM), a 32-billion-parameter open-weights LLM, to advance research on code generation with world models. To improve code understanding beyond what can be learned from training on static code alone, we mid-train…

Software Engineering · Computer Science 2025-10-13 FAIR CodeGen team , Jade Copet , Quentin Carbonneaux , Gal Cohen , Jonas Gehring , Jacob Kahn , Jannik Kossen , Felix Kreuk , Emily McMilin , Michel Meyer , Yuxiang Wei , David Zhang , Kunhao Zheng , Jordi Armengol-Estapé , Pedram Bashiri , Maximilian Beck , Pierre Chambon , Abhishek Charnalia , Chris Cummins , Juliette Decugis , Zacharias V. Fisches , François Fleuret , Fabian Gloeckle , Alex Gu , Michael Hassid , Daniel Haziza , Badr Youbi Idrissi , Christian Keller , Rahul Kindi , Hugh Leather , Gallil Maimon , Aram Markosyan , Francisco Massa , Pierre-Emmanuel Mazaré , Vegard Mella , Naila Murray , Keyur Muzumdar , Peter O'Hearn , Matteo Pagliardini , Dmitrii Pedchenko , Tal Remez , Volker Seeker , Marco Selvi , Oren Sultan , Sida Wang , Luca Wehrstedt , Ori Yoran , Lingming Zhang , Taco Cohen , Yossi Adi , Gabriel Synnaeve

Graph World Model

World models (WMs) demonstrate strong capabilities in prediction, generation, and planning tasks. Existing WMs primarily focus on unstructured data and cannot leverage the ubiquitous structured data, often represented as graphs, in the…

Machine Learning · Computer Science 2025-07-15 Tao Feng , Yexin Wu , Guanyu Lin , Jiaxuan You

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Multimodal large language models (MLLMs) are transforming the capabilities of graphical user interface (GUI) agents, facilitating their transition from controlled simulations to complex, real-world applications across various platforms.…

Artificial Intelligence · Computer Science 2025-06-18 Boyu Gou , Ruohan Wang , Boyuan Zheng , Yanan Xie , Cheng Chang , Yiheng Shu , Huan Sun , Yu Su

VDAWorld: World Modelling via VLM-Directed Abstraction and Simulation

Generative video models, a leading approach to world modeling, face fundamental limitations. They often violate physical and logical rules, lack interactivity, and operate as opaque black boxes ill-suited for building structured, queryable…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Felix O'Mahony , Roberto Cipolla , Ayush Tewari

Adapting Vision-Language Models for Evaluating World Models

World models - generative models that simulate environment dynamics conditioned on past observations and actions - are gaining prominence in planning, simulation, and embodied AI. However, evaluating their rollouts remains a fundamental…

Machine Learning · Computer Science 2025-11-26 Mariya Hendriksen , Tabish Rashid , David Bignell , Raluca Georgescu , Abdelhak Lemkhenter , Katja Hofmann , Sam Devlin , Sarah Parisot

Generative World Renderer

Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. To bridge this persistent domain gap, we introduce a large-scale, dynamic…

Computer Vision and Pattern Recognition · Computer Science 2026-04-03 Zheng-Hui Huang , Zhixiang Wang , Jiaming Tan , Ruihan Yu , Yidan Zhang , Bo Zheng , Yu-Lun Liu , Yung-Yu Chuang , Kaipeng Zhang

MobileDreamer: Generative Sketch World Model for GUI Agent

Mobile GUI agents have shown strong potential in real-world automation and practical applications. However, most existing agents remain reactive, making decisions mainly from current screen, which limits their performance on long-horizon…

Artificial Intelligence · Computer Science 2026-01-08 Yilin Cao , Yufeng Zhong , Zhixiong Zeng , Liming Zheng , Jing Huang , Haibo Qiu , Peng Shi , Wenji Mao , Wan Guanglu

GeoWorld-VLM: Geometry from World Models for Vision-Language Models

Modern Vision-Language Models (VLMs) achieve strong semantic recognition, yet remain brittle on elementary spatial relations such as left of, on, behind, and between. One cause of this failure arises before language reasoning begins: the…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Renjie Gu , Kaichen Zhou , Yan Luo , Mengyu Wang

Can World Models Benefit VLMs for World Dynamics?

Trained on internet-scale video data, generative world models are increasingly recognized as powerful world simulators that can generate consistent and plausible dynamics over structure, motion, and physics. This raises a natural question:…

Computer Vision and Pattern Recognition · Computer Science 2025-10-02 Kevin Zhang , Kuangzhi Ge , Xiaowei Chi , Renrui Zhang , Shaojun Shi , Zhen Dong , Sirui Han , Shanghang Zhang

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation

Training robot policies within a learned world model is trending due to the inefficiency of real-world interactions. The established image-based world models and policies have shown prior success, but lack robust geometric information that…

Robotics · Computer Science 2025-09-18 Guanxing Lu , Baoxiong Jia , Puhao Li , Yixin Chen , Ziwei Wang , Yansong Tang , Siyuan Huang

Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction

The Graphical User Interface (GUI) is pivotal for human interaction with the digital world, enabling efficient device control and the completion of complex tasks. Recent progress in Large Language Models (LLMs) and Vision Language Models…

Artificial Intelligence · Computer Science 2024-06-14 Danyang Zhang , Zhennan Shen , Rui Xie , Situo Zhang , Tianbao Xie , Zihan Zhao , Siyuan Chen , Lu Chen , Hongshen Xu , Ruisheng Cao , Kai Yu

R-VLM: Region-Aware Vision Language Model for Precise GUI Grounding

Visual agent models for automating human activities on Graphical User Interfaces (GUIs) have emerged as a promising research direction, driven by advances in large Vision Language Models (VLMs). A critical challenge in GUI automation is the…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Joonhyung Park , Peng Tang , Sagnik Das , Srikar Appalaraju , Kunwar Yashraj Singh , R. Manmatha , Shabnam Ghadar