Related papers: VectorWorld: Efficient Streaming World Model via D…

DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT

Recent successes in autoregressive (AR) generation models, such as the GPT series in natural language processing, have motivated efforts to replicate this success in visual tasks. Some works attempt to extend this approach to autonomous…

Computer Vision and Pattern Recognition · Computer Science 2024-12-31 Xiaotao Hu , Wei Yin , Mingkai Jia , Junyuan Deng , Xiaoyang Guo , Qian Zhang , Xiaoxiao Long , Ping Tan

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Video generation models, as one form of world models, have emerged as one of the most exciting frontiers in AI, promising agents the ability to imagine the future by modeling the temporal evolution of complex scenes. In autonomous driving,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Yang Zhou , Hao Shao , Letian Wang , Zhuofan Zong , Hongsheng Li , Steven L. Waslander

Interactive World Simulator for Robot Policy Training and Evaluation

Action-conditioned video prediction models (often referred to as world models) have shown strong potential for robotics applications, but existing approaches are often slow and struggle to capture physically consistent interactions over…

Robotics · Computer Science 2026-03-10 Yixuan Wang , Rhythm Syed , Fangyu Wu , Mengchao Zhang , Aykut Onol , Jose Barreiros , Hooshang Nayyeri , Tony Dear , Huan Zhang , Yunzhu Li

X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving

Scalable and reliable evaluation is increasingly critical in the end-to-end era of autonomous driving, where vision--language--action (VLA) policies directly map raw sensor streams to driving actions. Yet, current evaluation pipelines still…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Chaoda Zheng , Sean Li , Jinhao Deng , Zhennan Wang , Shijia Chen , Liqiang Xiao , Ziheng Chi , Hongbin Lin , Kangjie Chen , Boyang Wang , Yu Zhang , Xianming Liu

VDAWorld: World Modelling via VLM-Directed Abstraction and Simulation

Generative video models, a leading approach to world modeling, face fundamental limitations. They often violate physical and logical rules, lack interactivity, and operate as opaque black boxes ill-suited for building structured, queryable…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Felix O'Mahony , Roberto Cipolla , Ayush Tewari

Network-Efficient World Model Token Streaming

Generative driving world models rely on compact latent state representations that must be efficiently transmitted and synchronized across distributed compute and connected vehicles. We study network-efficient streaming of a discrete world…

Robotics · Computer Science 2026-05-12 Shatadal Mishra , Ahmadreza Moradipari , Nejib Ammar

World-in-World: World Models in a Closed-Loop World

Generative world models (WMs) can now simulate worlds with striking visual realism, which naturally raises the question of whether they can endow embodied agents with predictive perception for decision making. Progress on this question has…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Jiahan Zhang , Muqing Jiang , Nanru Dai , Taiming Lu , Arda Uzunoglu , Shunchi Zhang , Yana Wei , Jiahao Wang , Vishal M. Patel , Paul Pu Liang , Daniel Khashabi , Cheng Peng , Rama Chellappa , Tianmin Shu , Alan Yuille , Yilun Du , Jieneng Chen

PlayWorld: Learning Robot World Models from Autonomous Play

Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still…

Robotics · Computer Science 2026-04-07 Tenny Yin , Zhiting Mei , Zhonghe Zheng , Miyu Yamane , David Wang , Jade Sceats , Samuel M. Bateman , Lihan Zha , Apurva Badithela , Ola Shorinwa , Anirudha Majumdar

DriveWAM: Video Generative Priors Enable Scalable World-Action Modeling for Autonomous Driving

Pretrained foundation models have become an important basis for end-to-end autonomous driving. In contrast to vision-language models pretrained primarily on static image-text pairs, video generative models capture temporal dynamics and…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Chen Shi , Jinrui Xu , Shaoshuai Shi , Kehua Sheng , Bo Zhang , Li Jiang

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

World models, especially in autonomous driving, are trending and drawing extensive attention due to their capacity for comprehending driving environments. The established world model holds immense potential for the generation of…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Xiaofeng Wang , Zheng Zhu , Guan Huang , Xinze Chen , Jiagang Zhu , Jiwen Lu

SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model

With the rapid advancement of autonomous driving technology, a lack of data has become a major obstacle to enhancing perception model accuracy. Researchers are now exploring controllable data generation using world models to diversify…

Computer Vision and Pattern Recognition · Computer Science 2025-06-27 Xinqing Li , Ruiqi Song , Qingyu Xie , Ye Wu , Nanxin Zeng , Yunfeng Ai

RenderWorld: World Model with Self-Supervised 3D Label

End-to-end autonomous driving with vision-only is not only more cost-effective compared to LiDAR-vision fusion but also more reliable than traditional methods. To achieve a economical and robust purely visual autonomous driving system, we…

Computer Vision and Pattern Recognition · Computer Science 2025-02-14 Ziyang Yan , Wenzhen Dong , Yihua Shao , Yuhang Lu , Liu Haiyang , Jingwen Liu , Haozhe Wang , Zhe Wang , Yan Wang , Fabio Remondino , Yuexin Ma

DynFlowDrive: Flow-Based Dynamic World Modeling for Autonomous Driving

Recently, world models have been incorporated into the autonomous driving systems to improve the planning reliability. Existing approaches typically predict future states through appearance generation or deterministic regression, which…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Xiaolu Liu , Yicong Li , Song Wang , Junbo Chen , Angela Yao , Jianke Zhu

BEVWorld: A Multimodal World Simulator for Autonomous Driving via Scene-Level BEV Latents

World models have attracted increasing attention in autonomous driving for their ability to forecast potential future scenarios. In this paper, we propose BEVWorld, a novel framework that transforms multimodal sensor inputs into a unified…

Computer Vision and Pattern Recognition · Computer Science 2025-05-01 Yumeng Zhang , Shi Gong , Kaixin Xiong , Xiaoqing Ye , Xiaofan Li , Xiao Tan , Fan Wang , Jizhou Huang , Hua Wu , Haifeng Wang

TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model

World models aim to endow AI systems with the ability to represent, generate, and interact with dynamic environments in a coherent and temporally consistent manner. While recent video generation models have demonstrated impressive visual…

Computer Vision and Pattern Recognition · Computer Science 2026-01-05 Yabo Chen , Yuanzhi Liang , Jiepeng Wang , Tingxi Chen , Junfei Cheng , Zixiao Gu , Yuyang Huang , Zicheng Jiang , Wei Li , Tian Li , Weichen Li , Zuoxin Li , Guangce Liu , Jialun Liu , Junqi Liu , Haoyuan Wang , Qizhen Weng , Xuan'er Wu , Xunzhi Xiang , Xiaoyan Yang , Xin Zhang , Shiwen Zhang , Junyu Zhou , Chengcheng Zhou , Haibin Huang , Chi Zhang , Xuelong Li

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Dynamical systems theory and reinforcement learning view world evolution as latent-state dynamics driven by actions, with visual observations providing partial information about the state. Recent video world models attempt to learn this…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Zhen Li , Zian Meng , Shuwei Shi , Wenshuo Peng , Yuwei Wu , Bo Zheng , Chuanhao Li , Kaipeng Zhang

Vehicle Dynamics Embedded World Models for Autonomous Driving

World models have gained significant attention as a promising approach for autonomous driving. By emulating human-like perception and decision-making processes, these models can predict and adapt to dynamic environments. Existing methods…

Robotics · Computer Science 2025-12-03 Huiqian Li , Wei Pan , Haodong Zhang , Jin Huang , Zhihua Zhong

Uni-World VLA: Interleaved World Modeling and Planning for Autonomous Driving

Autonomous driving requires reasoning about how the environment evolves and planning actions accordingly. Existing world-model-based approaches typically predict future scenes first and plan afterwards, resulting in open-loop imagination…

Robotics · Computer Science 2026-03-31 Qiqi Liu , Huan Xu , Jingyu Li , Bin Sun , Zhihui Hao , Dangen She , Xiatian Zhu , Li Zhang

AutoWorld: Scaling Multi-Agent Traffic Simulation with Self-Supervised World Models

Multi-agent traffic simulation is central to developing and testing autonomous driving systems. Recent data-driven simulators have achieved promising results, but rely heavily on supervised learning from labeled trajectories or semantic…

Robotics · Computer Science 2026-04-01 Mozhgan Pourkeshavatz , Tianran Liu , Nicholas Rhinehart

WebWorld: A Large-Scale World Model for Web Agent Training

Web agents require massive trajectories to generalize, yet real-world training is constrained by network latency, rate limits, and safety risks. We introduce \textbf{WebWorld} series, the first open-web simulator trained at scale. While…

Artificial Intelligence · Computer Science 2026-02-17 Zikai Xiao , Jianhong Tu , Chuhang Zou , Yuxin Zuo , Zhi Li , Peng Wang , Bowen Yu , Fei Huang , Junyang Lin , Zuozhu Liu