Related papers: A Multimodal Framework for Human-Multi-Agent Inter…

Agent AI: Surveying the Horizons of Multimodal Interaction

Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems…

Artificial Intelligence · Computer Science 2024-01-29 Zane Durante , Qiuyuan Huang , Naoki Wake , Ran Gong , Jae Sung Park , Bidipta Sarkar , Rohan Taori , Yusuke Noda , Demetri Terzopoulos , Yejin Choi , Katsushi Ikeuchi , Hoi Vo , Li Fei-Fei , Jianfeng Gao

Transforming Monolithic Foundation Models into Embodied Multi-Agent Architectures for Human-Robot Collaboration

Foundation models have become central to unifying perception and planning in robotics, yet real-world deployment exposes a mismatch between their monolithic assumption that a single model can handle all cognitive functions and the…

Robotics · Computer Science 2025-12-02 Nan Sun , Bo Mao , Yongchang Li , Chenxu Wang , Di Guo , Huaping Liu

Toward a Unified Framework for Collaborative Design of Human-AI Interaction

Human computer interaction is shifting from screen-based systems to multimodal interfaces where artificial intelligence powered systems increasingly interpret user intent through speech, gesture, and gaze. Yet users rarely understand how…

Human-Computer Interaction · Computer Science 2026-05-05 Ankur Bhatt , Sven Mayer

Conversational Language Models for Human-in-the-Loop Multi-Robot Coordination

With the increasing prevalence and diversity of robots interacting in the real world, there is need for flexible, on-the-fly planning and cooperation. Large Language Models are starting to be explored in a multimodal setup for…

Robotics · Computer Science 2024-03-01 William Hunt , Toby Godfrey , Mohammad D. Soorati

M2HRI: An LLM-Driven Multimodal Multi-Agent Framework for Personalized Human-Robot Interaction

Multi-robot systems hold significant promise for social environments such as homes and hospitals, yet existing multi-robot works treat robots as functionally identical, overlooking how robots individual identity shape user perception and…

Robotics · Computer Science 2026-04-15 Shaid Hasan , Breenice Lee , Sujan Sarker , Tariq Iqbal

Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models

In this paper, we extended the method proposed in [21] to enable humans to interact naturally with autonomous agents through vocal and textual conversations. Our extended method exploits the inherent capabilities of pre-trained large…

Robotics · Computer Science 2024-12-31 Linus Nwankwo , Elmar Rueckert

Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

Programming robot behavior in a complex world faces challenges on multiple levels, from dextrous low-level skills to high-level planning and reasoning. Recent pre-trained Large Language Models (LLMs) have shown remarkable reasoning ability…

Robotics · Computer Science 2023-10-12 Xufeng Zhao , Mengdi Li , Cornelius Weber , Muhammad Burhan Hafez , Stefan Wermter

Design of Seamless Multi-modal Interaction Framework for Intelligent Virtual Agents in Wearable Mixed Reality Environment

In this paper, we present the design of a multimodal interaction framework for intelligent virtual agents in wearable mixed reality environments, especially for interactive applications at museums, botanical gardens, and similar places.…

Human-Computer Interaction · Computer Science 2025-03-26 Ghazanfar Ali , Hong-Quan Le , Junho Kim , Seoung-won Hwang , Jae-In Hwang

Multi-agent Embodied AI: Advances and Future Directions

Embodied artificial intelligence (Embodied AI) plays a pivotal role in the application of advanced technologies in the intelligent era, where AI systems are integrated with physical bodies that enable them to perceive, reason, and interact…

Artificial Intelligence · Computer Science 2025-06-24 Zhaohan Feng , Ruiqi Xue , Lei Yuan , Yang Yu , Ning Ding , Meiqin Liu , Bingzhao Gao , Jian Sun , Xinhu Zheng , Gang Wang

Bidirectional Intent Communication: A Role for Large Foundation Models

Integrating multimodal foundation models has significantly enhanced autonomous agents' language comprehension, perception, and planning capabilities. However, while existing works adopt a \emph{task-centric} approach with minimal human…

Robotics · Computer Science 2024-08-21 Tim Schreiter , Rishi Hazra , Jens Rüppel , Andrey Rudenko

Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction

With the recent development of natural language generation models - termed as large language models (LLMs) - a potential use case has opened up to improve the way that humans interact with robot assistants. These LLMs should be able to…

Multiagent Systems · Computer Science 2024-11-27 Mitchell Rosser , Marc. G Carmichael

Embodiment in multimodal large language models

Multimodal Large Language Models (MLLMs) have demonstrated extraordinary progress in bridging textual and visual inputs. However, MLLMs still face challenges in situated physical and social interactions in sensorally rich, multimodal and…

Neurons and Cognition · Quantitative Biology 2025-10-17 Akila Kadambi , Lisa Aziz-Zadeh , Antonio Damasio , Marco Iacoboni , Srini Narayanan

Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework

The socially-aware navigation system has evolved to adeptly avoid various obstacles while performing multiple tasks, such as point-to-point navigation, human-following, and -guiding. However, a prominent gap persists: in Human-Robot…

Robotics · Computer Science 2024-03-22 Weiqin Zu , Wenbin Song , Ruiqing Chen , Ze Guo , Fanglei Sun , Zheng Tian , Wei Pan , Jun Wang

Enhancing Explainability with Multimodal Context Representations for Smarter Robots

Artificial Intelligence (AI) has significantly advanced in recent years, driving innovation across various fields, especially in robotics. Even though robots can perform complex tasks with increasing autonomy, challenges remain in ensuring…

Human-Computer Interaction · Computer Science 2025-03-24 Anargh Viswanath , Lokesh Veeramacheneni , Hendrik Buschmeier

Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents

We introduce the concept of "empathic grounding" in conversational agents as an extension of Clark's conceptualization of grounding in conversation in which the grounding criterion includes listener empathy for the speaker's affective…

Human-Computer Interaction · Computer Science 2024-07-03 Mehdi Arjmand , Farnaz Nouraei , Ian Steenstra , Timothy Bickmore

Bi-Directional Mental Model Reconciliation for Human-Robot Interaction with Large Language Models

In human-robot interactions, human and robot agents maintain internal mental models of their environment, their shared task, and each other. The accuracy of these representations depends on each agent's ability to perform theory of mind,…

Robotics · Computer Science 2025-03-11 Nina Moorman , Michelle Zhao , Matthew B. Luebbers , Sanne Van Waveren , Reid Simmons , Henny Admoni , Sonia Chernova , Matthew Gombolay

EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents

Heterogeneous multi-robot systems (HMRS) have emerged as a powerful approach for tackling complex tasks that single robots cannot manage alone. Current large-language-model-based multi-agent systems (LLM-based MAS) have shown success in…

Robotics · Computer Science 2025-02-18 Junting Chen , Checheng Yu , Xunzhe Zhou , Tianqi Xu , Yao Mu , Mengkang Hu , Wenqi Shao , Yikai Wang , Guohao Li , Lin Shao

A MultiModal Social Robot Toward Personalized Emotion Interaction

Human emotions are expressed through multiple modalities, including verbal and non-verbal information. Moreover, the affective states of human users can be the indicator for the level of engagement and successful interaction, suitable for…

Robotics · Computer Science 2021-10-12 Baijun Xie , Chung Hyuk Park

A Roadmap for Embodied and Social Grounding in LLMs

The fusion of Large Language Models (LLMs) and robotic systems has led to a transformative paradigm in the robotic field, offering unparalleled capabilities not only in the communication domain but also in skills like multimodal input…

Robotics · Computer Science 2025-02-18 Sara Incao , Carlo Mazzola , Giulia Belgiovine , Alessandra Sciutti

Multi-Agent Systems for Robotic Autonomy with LLMs

Since the advent of Large Language Models (LLMs), various research based on such models have maintained significant academic attention and impact, especially in AI and robotics. In this paper, we propose a multi-agent framework with LLMs to…

Robotics · Computer Science 2025-05-12 Junhong Chen , Ziqi Yang , Haoyuan G Xu , Dandan Zhang , George Mylonas