Related papers: Generative World Explorer

GenEx: Generating an Explorable World

Understanding, navigating, and exploring the 3D physical real world has long been a central challenge in the development of artificial intelligence. In this work, we take a step toward this goal by introducing GenEx, a system capable of…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Taiming Lu , Tianmin Shu , Junfei Xiao , Luoxin Ye , Jiahao Wang , Cheng Peng , Chen Wei , Daniel Khashabi , Rama Chellappa , Alan Yuille , Jieneng Chen

3D-Belief: Embodied Belief Inference via Generative 3D World Modeling

Recent advances in visual generative models have highlighted the promise of learning generative world models. However, most existing approaches frame world modeling as novel-view synthesis or future-frame prediction, emphasizing visual…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Yifan Yin , Zehao Wen , Jieneng Chen , Zehan Zheng , Nanru Dai , Haojun Shi , Suyu Ye , Aydan Huang , Zheyuan Zhang , Alan Yuille , Jianwen Xie , Ayush Tewari , Tianmin Shu

Active World-Model with 4D-informed Retrieval for Exploration and Awareness

Physical awareness, especially in a large and dynamic environment, is shaped by sensing decisions that determine observability across space, time, and scale, while observations impact the quality of sensing decisions. This loopy information…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Elaheh Vaezpour , Amirhosein Javadi , Tara Javidi

Learning Generative Interactive Environments By Trained Agent Exploration

World models are increasingly pivotal in interpreting and simulating the rules and actions of complex environments. Genie, a recent model, excels at learning from visually diverse environments but relies on costly human-collected data. We…

Computer Vision and Pattern Recognition · Computer Science 2024-10-21 Naser Kazemi , Nedko Savov , Danda Paudel , Luc Van Gool

Generative agents in the streets: Exploring the use of Large Language Models (LLMs) in collecting urban perceptions

Evaluating the surroundings to gain understanding, frame perspectives, and anticipate behavioral reactions is an inherent human trait. However, these continuous encounters are diverse and complex, posing challenges to their study and…

Computers and Society · Computer Science 2026-02-26 Deepank Verma , Olaf Mumm , Vanessa Miriam Carlow

Discovering and Achieving Goals via World Models

How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We…

Machine Learning · Computer Science 2021-10-19 Russell Mendonca , Oleh Rybkin , Kostas Daniilidis , Danijar Hafner , Deepak Pathak

Align While Search: Belief-Guided Exploratory Inference for World-Grounded Embodied Agents

In this paper, we propose a test-time adaptive agent that performs exploratory inference through posterior-guided belief refinement without relying on gradient-based updates or additional training for LLM agent operating under partial…

Artificial Intelligence · Computer Science 2026-01-01 Seohui Bae , Jeonghye Kim , Youngchul Sung , Woohyung Lim

Cognitive Planning for Object Goal Navigation using Generative AI Models

Recent advancements in Generative AI, particularly in Large Language Models (LLMs) and Large Vision-Language Models (LVLMs), offer new possibilities for integrating cognitive planning into robotic systems. In this work, we present a novel…

Robotics · Computer Science 2024-11-06 Arjun P S , Andrew Melnik , Gora Chand Nandi

Exploration-Driven Generative Interactive Environments

Modern world models require costly and time-consuming collection of large video datasets with action demonstrations by people or by environment-specific agents. To simplify training, we focus on using many virtual environments for…

Computer Vision and Pattern Recognition · Computer Science 2025-04-04 Nedko Savov , Naser Kazemi , Mohammad Mahdi , Danda Pani Paudel , Xi Wang , Luc Van Gool

Generative Physical AI in Vision: A Survey

Generative Artificial Intelligence (AI) has rapidly advanced the field of computer vision by enabling machines to create and interpret visual data with unprecedented sophistication. This transformation builds upon a foundation of generative…

Computer Vision and Pattern Recognition · Computer Science 2025-04-22 Daochang Liu , Junyu Zhang , Anh-Dung Dinh , Eunbyung Park , Shichao Zhang , Ajmal Mian , Mubarak Shah , Chang Xu

Envisioning global urban development with satellite imagery and generative AI

Urban development has been a defining force in human history, shaping cities for centuries. However, past studies mostly analyze such development as predictive tasks, failing to reflect its generative nature. Therefore, this study designs a…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Kailai Sun , Yuebing Liang , Mingyi He , Yunhan Zheng , Alok Prakash , Shenhao Wang , Jinhua Zhao , Alex "Sandy'' Pentland

Explore and Explain: Self-supervised Navigation and Recounting

Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Roberto Bigazzi , Federico Landi , Marcella Cornia , Silvia Cascianelli , Lorenzo Baraldi , Rita Cucchiara

WorldGen: A Large Scale Generative Simulator

In the era of deep learning, data is the critical determining factor in the performance of neural network models. Generating large datasets suffers from various difficulties such as scalability, cost efficiency and photorealism. To avoid…

Computer Vision and Pattern Recognition · Computer Science 2022-10-04 Chahat Deep Singh , Riya Kumari , Cornelia Fermüller , Nitin J. Sanket , Yiannis Aloimonos

Where Common Knowledge Cannot Be Formed, Common Belief Can -- Planning with Multi-Agent Belief Using Group Justified Perspectives

Epistemic planning is the sub-field of AI planning that focuses on changing knowledge and belief. It is important in both multi-agent domains where agents need to have knowledge/belief regarding the environment, but also the beliefs of…

Artificial Intelligence · Computer Science 2025-10-20 Guang Hu , Tim Miller , Nir Lipovetzky

What if? Emulative Simulation with World Models for Situated Reasoning

Situated reasoning often relies on active exploration, yet in many real-world scenarios such exploration is infeasible due to physical constraints of robots or safety concerns of visually impaired users. Given only a limited observation,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Ruiping Liu , Yufan Chen , Yuheng Zhang , Junwei Zheng , Kunyu Peng , Chengzhi Wu , Chenguang Huang , Di Wen , Jiaming Zhang , Kailun Yang , Rainer Stiefelhagen

Creating an AI Observer: Generative Semantic Workspaces

An experienced human Observer reading a document -- such as a crime report -- creates a succinct plot-like $\textit{``Working Memory''}$ comprising different actors, their prototypical roles and states at any point, their evolution over…

Computation and Language · Computer Science 2024-06-10 Pavan Holur , Shreyas Rajesh , David Chong , Vwani Roychowdhury

Neural representation in active inference: using generative models to interact with -- and understand -- the lived world

This paper considers neural representation through the lens of active inference, a normative framework for understanding brain function. It delves into how living organisms employ generative models to minimize the discrepancy between…

Neurons and Cognition · Quantitative Biology 2023-10-24 Giovanni Pezzulo , Leo D'Amato , Francesco Mannella , Matteo Priorelli , Toon Van de Maele , Ivilin Peev Stoianov , Karl Friston

Thinking with Generated Images

We present Thinking with Generated Images, a novel paradigm that fundamentally transforms how large multimodal models (LMMs) engage with visual reasoning by enabling them to natively think across text and vision modalities through…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Ethan Chern , Zhulin Hu , Steffi Chern , Siqi Kou , Jiadi Su , Yan Ma , Zhijie Deng , Pengfei Liu

Generative Temporal Models with Spatial Memory for Partially Observed Environments

In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent's representations during training or via use as part of an explicit planning…

Machine Learning · Statistics 2018-07-20 Marco Fraccaro , Danilo Jimenez Rezende , Yori Zwols , Alexander Pritzel , S. M. Ali Eslami , Fabio Viola

Learning 3D Persistent Embodied World Models

The ability to simulate the effects of future actions on the world is a crucial ability of intelligent embodied agents, enabling agents to anticipate the effects of their actions and make plans accordingly. While a large body of existing…

Computer Vision and Pattern Recognition · Computer Science 2025-05-12 Siyuan Zhou , Yilun Du , Yuncong Yang , Lei Han , Peihao Chen , Dit-Yan Yeung , Chuang Gan