Related papers: Redefining Robot Generalization Through Interactiv…

Transforming Monolithic Foundation Models into Embodied Multi-Agent Architectures for Human-Robot Collaboration

Foundation models have become central to unifying perception and planning in robotics, yet real-world deployment exposes a mismatch between their monolithic assumption that a single model can handle all cognitive functions and the…

Robotics · Computer Science 2025-12-02 Nan Sun , Bo Mao , Yongchang Li , Chenxu Wang , Di Guo , Huaping Liu

Transferring Foundation Models for Generalizable Robotic Manipulation

Improving the generalization capabilities of general-purpose robotic manipulation agents in the real world has long been a significant challenge. Existing approaches often rely on collecting large-scale robotic data which is costly and…

Robotics · Computer Science 2025-02-10 Jiange Yang , Wenhui Tan , Chuhao Jin , Keling Yao , Bei Liu , Jianlong Fu , Ruihua Song , Gangshan Wu , Limin Wang

An Interactive Agent Foundation Model

The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent…

Artificial Intelligence · Computer Science 2024-06-18 Zane Durante , Bidipta Sarkar , Ran Gong , Rohan Taori , Yusuke Noda , Paul Tang , Ehsan Adeli , Shrinidhi Kowshika Lakshmikanth , Kevin Schulman , Arnold Milstein , Demetri Terzopoulos , Ade Famoti , Noboru Kuno , Ashley Llorens , Hoi Vo , Katsu Ikeuchi , Li Fei-Fei , Jianfeng Gao , Naoki Wake , Qiuyuan Huang

Position Paper: Agent AI Towards a Holistic Intelligence

Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments. In leveraging the power of foundation models, it is crucial for AI research to pivot away from…

Artificial Intelligence · Computer Science 2024-03-05 Qiuyuan Huang , Naoki Wake , Bidipta Sarkar , Zane Durante , Ran Gong , Rohan Taori , Yusuke Noda , Demetri Terzopoulos , Noboru Kuno , Ade Famoti , Ashley Llorens , John Langford , Hoi Vo , Li Fei-Fei , Katsu Ikeuchi , Jianfeng Gao

A Survey on Robotics with Foundation Models: toward Embodied AI

While the exploration for embodied AI has spanned multiple decades, it remains a persistent challenge to endow agents with human-level intelligence, including perception, learning, reasoning, decision-making, control, and generalization…

Robotics · Computer Science 2024-02-07 Zhiyuan Xu , Kun Wu , Junjie Wen , Jinming Li , Ning Liu , Zhengping Che , Jian Tang

A Multimodal Framework for Human-Multi-Agent Interaction

Human-robot interaction is increasingly moving toward multi-robot, socially grounded environments. Existing systems struggle to integrate multimodal perception, embodied expression, and coordinated decision-making in a unified framework.…

Robotics · Computer Science 2026-03-25 Shaid Hasan , Breenice Lee , Sujan Sarker , Tariq Iqbal

Leveraging Foundation Models for Enhancing Robot Perception and Action

This thesis investigates how foundation models can be systematically leveraged to enhance robotic capabilities, enabling more effective localization, interaction, and manipulation in unstructured environments. The work is structured around…

Robotics · Computer Science 2025-11-03 Reihaneh Mirjalili

Bidirectional Intent Communication: A Role for Large Foundation Models

Integrating multimodal foundation models has significantly enhanced autonomous agents' language comprehension, perception, and planning capabilities. However, while existing works adopt a \emph{task-centric} approach with minimal human…

Robotics · Computer Science 2024-08-21 Tim Schreiter , Rishi Hazra , Jens Rüppel , Andrey Rudenko

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different…

Robotics · Computer Science 2025-11-11 Dingzhe Li , Yixiang Jin , Yuhao Sun , Yong A , Hongze Yu , Jun Shi , Xiaoshuai Hao , Peng Hao , Huaping Liu , Xiang Li , Xinde Li , Fuchun Sun , Jianwei Zhang , Bin Fang

Designing Social Robots with Ethical, User-Adaptive Explainability in the Era of Foundation Models

Foundation models are increasingly embedded in social robots, mediating not only what they say and do but also how they adapt to users over time. This shift renders traditional ``one-size-fits-all'' explanation strategies especially…

Robotics · Computer Science 2026-03-03 Fethiye Irmak Dogan , Alva Markelius , Hatice Gunes

Integrating Visual Foundation Models for Enhanced Robot Manipulation and Motion Planning: A Layered Approach

This paper presents a novel layered framework that integrates visual foundation models to improve robot manipulation tasks and motion planning. The framework consists of five layers: Perception, Cognition, Planning, Execution, and Learning.…

Robotics · Computer Science 2023-09-21 Chen Yang , Peng Zhou , Jiaming Qi

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

Building general-purpose robots that operate seamlessly in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. However, as a community, we have…

Robotics · Computer Science 2024-10-02 Yafei Hu , Quanting Xie , Vidhi Jain , Jonathan Francis , Jay Patrikar , Nikhil Keetha , Seungchan Kim , Yaqi Xie , Tianyi Zhang , Hao-Shu Fang , Shibo Zhao , Shayegan Omidshafiei , Dong-Ki Kim , Ali-akbar Agha-mohammadi , Katia Sycara , Matthew Johnson-Roberson , Dhruv Batra , Xiaolong Wang , Sebastian Scherer , Chen Wang , Zsolt Kira , Fei Xia , Yonatan Bisk

GRAPPA: Generalizing and Adapting Robot Policies via Online Agentic Guidance

Robot learning approaches such as behavior cloning and reinforcement learning have shown great promise in synthesizing robot skills from human demonstrations in specific environments. However, these approaches often require task-specific…

Robotics · Computer Science 2025-04-09 Arthur Bucker , Pablo Ortega-Kral , Jonathan Francis , Jean Oh

Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review

Rapid advancements in foundation models, including Large Language Models, Vision-Language Models, Multimodal Large Language Models, and Vision-Language-Action Models, have opened new avenues for embodied AI in mobile service robotics. By…

Robotics · Computer Science 2026-03-11 Matthew Lisondra , Beno Benhabib , Goldie Nejat

MARC: A multi-agent robots control framework for enhancing reinforcement learning in construction tasks

Letting robots emulate human behavior has always posed a challenge, particularly in scenarios involving multiple robots. In this paper, we presented a framework aimed at achieving multi-agent reinforcement learning for robot control in…

Robotics · Computer Science 2023-05-25 Kangkang Duan , Christine Wun Ki Suen , Zhengbo Zou

Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach

Foundation models are a promising path toward general-purpose and user-friendly robots. The prevalent approach involves training a generalist policy that, like a reinforcement learning policy, uses observations to output actions. Although…

Robotics · Computer Science 2024-07-12 Isaac Sheidlower , Reuben Aronson , Elaine Schaertl Short

Foundation Models in Robotics: Applications, Challenges, and the Future

We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In…

Robotics · Computer Science 2023-12-14 Roya Firoozi , Johnathan Tucker , Stephen Tian , Anirudha Majumdar , Jiankai Sun , Weiyu Liu , Yuke Zhu , Shuran Song , Ashish Kapoor , Karol Hausman , Brian Ichter , Danny Driess , Jiajun Wu , Cewu Lu , Mac Schwager

Towards Forceful Robotic Foundation Models: a Literature Survey

This article reviews contemporary methods for integrating force, including both proprioception and tactile sensing, in robot manipulation policy learning. We conduct a comparative analysis on various approaches for sensing force, data…

Robotics · Computer Science 2025-04-17 William Xie , Nikolaus Correll

Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives

Recent advances in vision, language, and multimodal learning have substantially accelerated progress in robotic foundation models, with robot manipulation remaining a central and challenging problem. This survey examines robot manipulation…

Robotics · Computer Science 2025-12-30 Shuanghao Bai , Wenxuan Song , Jiayi Chen , Yuheng Ji , Zhide Zhong , Jin Yang , Han Zhao , Wanqi Zhou , Zhe Li , Pengxiang Ding , Cheng Chi , Chang Xu , Xiaolong Zheng , Donglin Wang , Haoang Li , Shanghang Zhang , Badong Chen

Zero-shot adaptable task planning for autonomous construction robots: a comparative study of lightweight single and multi-AI agent systems

Robots are expected to play a major role in the future construction industry but face challenges due to high costs and difficulty adapting to dynamic tasks. This study explores the potential of foundation models to enhance the adaptability…

Robotics · Computer Science 2026-01-21 Hossein Naderi , Alireza Shojaei , Lifu Huang , Philip Agee , Kereshmeh Afsari , Abiola Akanmu