Related papers: Learning Interactive Real-World Simulators

Interactive World Simulator for Robot Policy Training and Evaluation

Action-conditioned video prediction models (often referred to as world models) have shown strong potential for robotics applications, but existing approaches are often slow and struggle to capture physically consistent interactions over…

Robotics · Computer Science 2026-03-10 Yixuan Wang , Rhythm Syed , Fangyu Wu , Mengchao Zhang , Aykut Onol , Jose Barreiros , Hooshang Nayyeri , Tony Dear , Huan Zhang , Yunzhu Li

Pre-Trained Video Generative Models as World Simulators

Video generative models pre-trained on large-scale internet datasets have achieved remarkable success, excelling at producing realistic synthetic videos. However, they often generate clips based on static prompts (e.g., text or images),…

Computer Vision and Pattern Recognition · Computer Science 2025-02-13 Haoran He , Yang Zhang , Liang Lin , Zhongwen Xu , Ling Pan

Simulating the Visual World with Artificial Intelligence: A Roadmap

The landscape of video generation is shifting, from a focus on generating visually appealing clips to building virtual environments that support interaction and maintain physical plausibility. These developments point toward the emergence…

Artificial Intelligence · Computer Science 2026-02-09 Jingtong Yue , Ziqi Huang , Zhaoxi Chen , Xintao Wang , Pengfei Wan , Ziwei Liu

Simulating the Real World: A Unified Survey of Multimodal Generative Models

Understanding and replicating the real world is a critical challenge in Artificial General Intelligence (AGI) research. To achieve this, many existing approaches, such as world models, aim to capture the fundamental principles governing the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Yuqi Hu , Longguang Wang , Xian Liu , Ling-Hao Chen , Yuwei Guo , Yukai Shi , Ce Liu , Anyi Rao , Zeyu Wang , Hui Xiong

Learning to Simulate Dynamic Environments with GameGAN

Simulation is a crucial component of any robotic system. In order to simulate correctly, we need to write complex rules of the environment: how dynamic agents behave, and how the actions of each of the agents affect the behavior of others.…

Computer Vision and Pattern Recognition · Computer Science 2020-05-26 Seung Wook Kim , Yuhao Zhou , Jonah Philion , Antonio Torralba , Sanja Fidler

User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

User simulation is an emerging interdisciplinary topic with multiple critical applications in the era of Generative AI. It involves creating an intelligent agent that mimics the actions of a human user interacting with an AI system,…

Artificial Intelligence · Computer Science 2026-04-22 Krisztian Balog , ChengXiang Zhai

From Generative Engines to Actionable Simulators: The Imperative of Physical Grounding in World Models

A world model is an AI system that simulates how an environment evolves under actions, enabling planning through imagined futures rather than reactive perception. Current world models, however, suffer from visual conflation: the mistaken…

Artificial Intelligence · Computer Science 2026-01-23 Zhikang Chen , Tingting Zhu

Video In-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators

People interact with the real-world largely dependent on visual signal, which are ubiquitous and illustrate detailed demonstrations. In this paper, we explore utilizing visual signals as a new interface for models to interact with the…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Wentao Zhang , Junliang Guo , Tianyu He , Li Zhao , Linli Xu , Jiang Bian

Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models

Generalist robot manipulators need to learn a wide variety of manipulation skills across diverse environments. Current robot training pipelines rely on humans to provide kinesthetic demonstrations or to program simulation environments and…

Robotics · Computer Science 2023-10-30 Pushkal Katara , Zhou Xian , Katerina Fragkiadaki

World Models

We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the…

Machine Learning · Computer Science 2018-05-10 David Ha , Jürgen Schmidhuber

Generative Agents: Interactive Simulacra of Human Behavior

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative…

Human-Computer Interaction · Computer Science 2023-08-08 Joon Sung Park , Joseph C. O'Brien , Carrie J. Cai , Meredith Ringel Morris , Percy Liang , Michael S. Bernstein

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

Imitation learning has emerged as a promising approach towards building generalist robots. However, scaling imitation learning for large robot foundation models remains challenging due to its reliance on high-quality expert demonstrations.…

Robotics · Computer Science 2025-05-26 Chuning Zhu , Raymond Yu , Siyuan Feng , Benjamin Burchfiel , Paarth Shah , Abhishek Gupta

UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments

Developing and testing user interfaces (UIs) and training AI agents to interact with them are challenging due to the dynamic and diverse nature of real-world mobile environments. Existing methods often rely on cumbersome physical devices or…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Jiannan Xiang , Yun Zhu , Lei Shu , Maria Wang , Lijun Yu , Gabriel Barcik , James Lyon , Srinivas Sunkara , Jindong Chen

LidarDM: Generative LiDAR Simulation in a Generated World

We present LidarDM, a novel LiDAR generative model capable of producing realistic, layout-aware, physically plausible, and temporally coherent LiDAR videos. LidarDM stands out with two unprecedented capabilities in LiDAR generative…

Computer Vision and Pattern Recognition · Computer Science 2025-12-30 Vlas Zyrianov , Henry Che , Zhijian Liu , Shenlong Wang

Semantically Controllable Augmentations for Generalizable Robot Learning

Generalization to unseen real-world scenarios for robot manipulation requires exposure to diverse datasets during training. However, collecting large real-world datasets is intractable due to high operational costs. For robot learning to…

Robotics · Computer Science 2024-09-04 Zoey Chen , Zhao Mandi , Homanga Bharadhwaj , Mohit Sharma , Shuran Song , Abhishek Gupta , Vikash Kumar

TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors

Simulation has the potential to massively scale evaluation of self-driving systems enabling rapid development as well as safe deployment. To close the gap between simulation and the real world, we need to simulate realistic multi-agent…

Robotics · Computer Science 2021-01-19 Simon Suo , Sebastian Regalado , Sergio Casas , Raquel Urtasun

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation

Training robot policies within a learned world model is trending due to the inefficiency of real-world interactions. The established image-based world models and policies have shown prior success, but lack robust geometric information that…

Robotics · Computer Science 2025-09-18 Guanxing Lu , Baoxiong Jia , Puhao Li , Yixin Chen , Ziwei Wang , Yansong Tang , Siyuan Huang

Genie: Generative Interactive Environments

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described…

Machine Learning · Computer Science 2024-02-26 Jake Bruce , Michael Dennis , Ashley Edwards , Jack Parker-Holder , Yuge Shi , Edward Hughes , Matthew Lai , Aditi Mavalankar , Richie Steigerwald , Chris Apps , Yusuf Aytar , Sarah Bechtle , Feryal Behbahani , Stephanie Chan , Nicolas Heess , Lucy Gonzalez , Simon Osindero , Sherjil Ozair , Scott Reed , Jingwei Zhang , Konrad Zolna , Jeff Clune , Nando de Freitas , Satinder Singh , Tim Rocktäschel

Interactive Text Generation

Users interact with text, image, code, or other editors on a daily basis. However, machine learning models are rarely trained in the settings that reflect the interactivity between users and their editor. This is understandable as training…

Computation and Language · Computer Science 2023-11-14 Felix Faltings , Michel Galley , Baolin Peng , Kianté Brantley , Weixin Cai , Yizhe Zhang , Jianfeng Gao , Bill Dolan

GenSim: Generating Robotic Simulation Tasks via Large Language Models

Collecting large amounts of real-world interaction data to train general robotic policies is often prohibitively expensive, thus motivating the use of simulation data. However, existing methods for data generation have generally focused on…

Machine Learning · Computer Science 2024-01-23 Lirui Wang , Yiyang Ling , Zhecheng Yuan , Mohit Shridhar , Chen Bao , Yuzhe Qin , Bailin Wang , Huazhe Xu , Xiaolong Wang