Related papers: Exploration-Driven Generative Interactive Environm…

Learning Generative Interactive Environments By Trained Agent Exploration

World models are increasingly pivotal in interpreting and simulating the rules and actions of complex environments. Genie, a recent model, excels at learning from visually diverse environments but relies on costly human-collected data. We…

Computer Vision and Pattern Recognition · Computer Science 2024-10-21 Naser Kazemi , Nedko Savov , Danda Paudel , Luc Van Gool

Genie: Generative Interactive Environments

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described…

Machine Learning · Computer Science 2024-02-26 Jake Bruce , Michael Dennis , Ashley Edwards , Jack Parker-Holder , Yuge Shi , Edward Hughes , Matthew Lai , Aditi Mavalankar , Richie Steigerwald , Chris Apps , Yusuf Aytar , Sarah Bechtle , Feryal Behbahani , Stephanie Chan , Nicolas Heess , Lucy Gonzalez , Simon Osindero , Sherjil Ozair , Scott Reed , Jingwei Zhang , Konrad Zolna , Jeff Clune , Nando de Freitas , Satinder Singh , Tim Rocktäschel

Learning To Explore With Predictive World Model Via Self-Supervised Learning

Autonomous artificial agents must be able to learn behaviors in complex environments without humans to design tasks and rewards. Designing these functions for each environment is not feasible, thus, motivating the development of intrinsic…

Machine Learning · Computer Science 2025-02-20 Alana Santana , Paula P. Costa , Esther L. Colombini

Behavioral Exploration: Learning to Explore via In-Context Adaptation

Developing autonomous agents that quickly explore an environment and adapt their behavior online is a canonical challenge in robotics and machine learning. While humans are able to achieve such fast online exploration and adaptation, often…

Machine Learning · Computer Science 2025-07-15 Andrew Wagenmaker , Zhiyuan Zhou , Sergey Levine

LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark

Mobile GUI agents show promise in automating tasks but face generalization challenges in diverse real-world scenarios. Traditional approaches using pre-training or fine-tuning with massive datasets struggle with the diversity of mobile…

Human-Computer Interaction · Computer Science 2025-04-21 Guangyi Liu , Pengxiang Zhao , Liang Liu , Zhiming Chen , Yuxiang Chai , Shuai Ren , Hao Wang , Shibo He , Wenchao Meng

AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning

Humans naturally adapt to diverse environments by learning underlying rules across worlds with different dynamics, observations, and reward structures. In contrast, existing agents typically demonstrate improvements via self-evolving within…

Artificial Intelligence · Computer Science 2025-12-04 Jiayi Zhang , Yiran Peng , Fanqi Kong , Cheng Yang , Yifan Wu , Zhaoyang Yu , Jinyu Xiang , Jianhao Ruan , Jinlin Wang , Maojia Song , HongZhang Liu , Xiangru Tang , Bang Liu , Chenglin Wu , Yuyu Luo

ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World

The rapid progress of large language models (LLMs) has sparked growing interest in building Artificial General Intelligence (AGI) within Graphical User Interface (GUI) environments. However, existing GUI agents based on LLMs or…

Artificial Intelligence · Computer Science 2025-05-27 Runliang Niu , Jinglong Ji , Yi Chang , Qi Wang

Multi-agent evolutionary systems for the generation of complex virtual worlds

Modern films, games and virtual reality applications are dependent on convincing computer graphics. Highly complex models are a requirement for the successful delivery of many scenes and environments. While workflows such as rendering,…

Neural and Evolutionary Computing · Computer Science 2016-04-21 Jan Kruse , Andy M. Connor

Learning to Play with Intrinsically-Motivated Self-Aware Agents

Infants are experts at playing, with an amazing ability to generate novel structured behaviors in unstructured environments that lack clear extrinsic reward signals. We seek to mathematically formalize these abilities using a neural network…

Machine Learning · Computer Science 2018-11-01 Nick Haber , Damian Mrowca , Li Fei-Fei , Daniel L. K. Yamins

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning

We introduce GenAgent, unifying visual understanding and generation through an agentic multimodal model. Unlike unified models that face expensive training costs and understanding-generation trade-offs, GenAgent decouples these capabilities…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Kaixun Jiang , Yuzheng Wang , Junjie Zhou , Pandeng Li , Zhihang Liu , Chen-Wei Xie , Zhaoyu Chen , Yun Zheng , Wenqiang Zhang

GenEx: Generating an Explorable World

Understanding, navigating, and exploring the 3D physical real world has long been a central challenge in the development of artificial intelligence. In this work, we take a step toward this goal by introducing GenEx, a system capable of…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Taiming Lu , Tianmin Shu , Junfei Xiao , Luoxin Ye , Jiahao Wang , Cheng Peng , Chen Wei , Daniel Khashabi , Rama Chellappa , Alan Yuille , Jieneng Chen

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Video generation models, as one form of world models, have emerged as one of the most exciting frontiers in AI, promising agents the ability to imagine the future by modeling the temporal evolution of complex scenes. In autonomous driving,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Yang Zhou , Hao Shao , Letian Wang , Zhuofan Zong , Hongsheng Li , Steven L. Waslander

GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning

With the rapid development of Large Vision Language Models, the focus of Graphical User Interface (GUI) agent tasks shifts from single-screen tasks to complex screen navigation challenges. However, real-world GUI environments, such as PC…

Computer Vision and Pattern Recognition · Computer Science 2025-12-03 Haolong Yan , Yeqing Shen , Xin Huang , Jia Wang , Kaijun Tan , Zhixuan Liang , Hongxin Li , Zheng Ge , Osamu Yoshie , Si Li , Xiangyu Zhang , Daxin Jiang

Generative World Explorer

Planning with partial observation is a central challenge in embodied AI. A majority of prior works have tackled this challenge by developing agents that physically explore their environment to update their beliefs about the world state. In…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Taiming Lu , Tianmin Shu , Alan Yuille , Daniel Khashabi , Jieneng Chen

EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds

Learning an agent model that behaves like humans-capable of jointly perceiving the environment, predicting the future, and taking actions from a first-person perspective-is a fundamental challenge in computer vision. Existing methods…

Computer Vision and Pattern Recognition · Computer Science 2025-09-12 Lu Chen , Yizhou Wang , Shixiang Tang , Qianhong Ma , Tong He , Wanli Ouyang , Xiaowei Zhou , Hujun Bao , Sida Peng

Learning with Challenges: Adaptive Difficulty-Aware Data Generation for Mobile GUI Agent Training

Large-scale, high-quality interaction trajectories are essential for advancing mobile Graphical User Interface (GUI) agents. While existing methods typically rely on labor-intensive human demonstrations or automated model exploration to…

Artificial Intelligence · Computer Science 2026-02-02 Linjia Kang , Zhimin Wang , Yongkang Zhang , Duo Wu , Jinghe Wang , Ming Ma , Haopeng Yan , Zhi Wang

TravelAgent: Generative Agents in the Built Environment

Understanding human behavior in built environments is critical for designing functional, user centered urban spaces. Traditional approaches, such as manual observations, surveys, and simplified simulations, often fail to capture the…

Artificial Intelligence · Computer Science 2024-12-30 Ariel Noyman , Kai Hu , Kent Larson

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

Agents powered by large language models have shown remarkable abilities in solving complex tasks. However, most agent systems remain reactive, limiting their effectiveness in scenarios requiring foresight and autonomous decision-making. In…

Artificial Intelligence · Computer Science 2024-12-04 Yaxi Lu , Shenzhi Yang , Cheng Qian , Guirong Chen , Qinyu Luo , Yesai Wu , Huadong Wang , Xin Cong , Zhong Zhang , Yankai Lin , Weiwen Liu , Yasheng Wang , Zhiyuan Liu , Fangming Liu , Maosong Sun

GenAI-based Multi-Agent Reinforcement Learning towards Distributed Agent Intelligence: A Generative-RL Agent Perspective

Multi-agent reinforcement learning faces fundamental challenges that conventional approaches have failed to overcome: exponentially growing joint action spaces, non-stationary environments where simultaneous learning creates moving targets,…

Artificial Intelligence · Computer Science 2025-07-15 Hang Wang , Junshan Zhang

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

Standard reinforcement learning (RL) for large language model (LLM) agents typically optimizes extrinsic rewards, prioritizing isolated task completion over continual adaptation. Consequently, agents often converge to suboptimal policies…

Artificial Intelligence · Computer Science 2026-03-31 Xiaoying Zhang , Zichen Liu , Yipeng Zhang , Xia Hu , Wenqi Shao