Related papers: Embodied Executable Policy Learning with Language-…

Generating Executable Action Plans with Environmentally-Aware Language Models

Large Language Models (LLMs) trained using massive text datasets have recently shown promise in generating action plans for robotic agents from high level text queries. However, these models typically do not consider the robot's…

Robotics · Computer Science 2023-05-03 Maitrey Gramopadhye , Daniel Szafir

Large Language Models as Generalizable Policies for Embodied Tasks

We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as…

Machine Learning · Computer Science 2024-04-17 Andrew Szot , Max Schwarzer , Harsh Agrawal , Bogdan Mazoure , Walter Talbott , Katherine Metcalf , Natalie Mackraz , Devon Hjelm , Alexander Toshev

LuciBot: Automated Robot Policy Learning from Generated Videos

Automatically generating training supervision for embodied tasks is crucial, as manual designing is tedious and not scalable. While prior works use large language models (LLMs) or vision-language models (VLMs) to generate rewards, these…

Computer Vision and Pattern Recognition · Computer Science 2025-03-14 Xiaowen Qiu , Yian Wang , Jiting Cai , Zhehuan Chen , Chunru Lin , Tsun-Hsuan Wang , Chuang Gan

Embodied Spatial Intelligence: from Implicit Scene Modeling to Spatial Reasoning

This thesis introduces "Embodied Spatial Intelligence" to address the challenge of creating robots that can perceive and act in the real world based on natural language instructions. To bridge the gap between Large Language Models (LLMs)…

Robotics · Computer Science 2025-09-03 Jiading Fang

Learning Structured Robot Policies from Vision-Language Models via Synthetic Neuro-Symbolic Supervision

Vision-Language Models (VLMs) have recently demonstrated strong capabilities in mapping multimodal observations to robot behaviors. However, most current approaches rely on end-to-end visuomotor policies that remain opaque and difficult to…

Robotics · Computer Science 2026-05-18 Alessandro Adami , Tommaso Tubaldo , Marco Todescato , Ruggero Carli , Pietro Falco

Action Contextualization: Adaptive Task Planning and Action Tuning using Large Language Models

Large Language Models (LLMs) present a promising frontier in robotic task planning by leveraging extensive human knowledge. Nevertheless, the current literature often overlooks the critical aspects of robots' adaptability and error…

Robotics · Computer Science 2024-11-27 Sthithpragya Gupta , Kunpeng Yao , Loïc Niederhauser , Aude Billard

Realizing Video Summarization from the Path of Language-based Semantic Understanding

The recent development of Video-based Large Language Models (VideoLLMs), has significantly advanced video summarization by aligning video features and, in some cases, audio features with Large Language Models (LLMs). Each of these VideoLLMs…

Computer Vision and Pattern Recognition · Computer Science 2024-10-08 Kuan-Chen Mu , Zhi-Yi Chin , Wei-Chen Chiu

From Actions to Words: Towards Abstractive-Textual Policy Summarization in RL

Explaining reinforcement learning agents is challenging because policies emerge from complex reward structures and neural representations that are difficult for humans to interpret. Existing approaches often rely on curated demonstrations…

Machine Learning · Computer Science 2026-01-09 Sahar Admoni , Assaf Hallak , Yftah Ziser , Omer Ben-Porat , Ofra Amir

Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning

Recent advances in large language models (LLMs) have enabled the automatic generation of executable code for task planning and control in embodied agents such as robots, demonstrating the potential of LLM-based embodied intelligence.…

Artificial Intelligence · Computer Science 2025-10-27 Sanghyun Ahn , Wonje Choi , Junyong Lee , Jinwoo Park , Honguk Woo

LEMMo-Plan: LLM-Enhanced Learning from Multi-Modal Demonstration for Planning Sequential Contact-Rich Manipulation Tasks

Large Language Models (LLMs) have gained popularity in task planning for long-horizon manipulation tasks. To enhance the validity of LLM-generated plans, visual demonstrations and online videos have been widely employed to guide the…

Robotics · Computer Science 2025-03-12 Kejia Chen , Zheng Shen , Yue Zhang , Lingyun Chen , Fan Wu , Zhenshan Bing , Sami Haddadin , Alois Knoll

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

In recent years, reinforcement learning and imitation learning have shown great potential for controlling humanoid robots' motion. However, these methods typically create simulation environments and rewards for specific tasks, resulting in…

Robotics · Computer Science 2024-08-01 Jingkai Sun , Qiang Zhang , Yiqun Duan , Xiaoyang Jiang , Chong Cheng , Renjing Xu

Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models

Large Language Models (LLM) have emerged as a tool for robots to generate task plans using common sense reasoning. For the LLM to generate actionable plans, scene context must be provided, often through a map. Recent works have shifted from…

Robotics · Computer Science 2024-09-25 Mike Zhang , Kaixian Qu , Vaishakh Patil , Cesar Cadena , Marco Hutter

TR-LLM: Integrating Trajectory Data for Scene-Aware LLM-Based Human Action Prediction

Accurate prediction of human behavior is crucial for AI systems to effectively support real-world applications, such as autonomous robots anticipating and assisting with human tasks. Real-world scenarios frequently present challenges such…

Human-Computer Interaction · Computer Science 2025-07-21 Kojiro Takeyama , Yimeng Liu , Misha Sra

Simulating User Agents for Embodied Conversational-AI

Embodied agents designed to assist users with tasks must engage in natural language interactions, interpret instructions, execute actions, and communicate effectively to resolve issues. However, collecting large-scale, diverse datasets of…

Computation and Language · Computer Science 2024-11-01 Daniel Philipov , Vardhan Dongre , Gokhan Tur , Dilek Hakkani-Tür

Code as Policies: Language Model Programs for Embodied Control

Large language models (LLMs) trained on code completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can be re-purposed to write robot policy code, given…

Robotics · Computer Science 2023-05-26 Jacky Liang , Wenlong Huang , Fei Xia , Peng Xu , Karol Hausman , Brian Ichter , Pete Florence , Andy Zeng

Using Large Language Models to Detect Socially Shared Regulation of Collaborative Learning

The field of learning analytics has made notable strides in automating the detection of complex learning processes in multimodal data. However, most advancements have focused on individualized problem-solving instead of collaborative,…

Machine Learning · Computer Science 2026-01-09 Jiayi Zhang , Conrad Borchers , Clayton Cohn , Namrata Srivastava , Caitlin Snyder , Siyuan Guo , Ashwin T S , Naveeduddin Mohammed , Haley Noh , Gautam Biswas

Towards Natural Language-Driven Assembly Using Foundation Models

Large Language Models (LLMs) and strong vision models have enabled rapid research and development in the field of Vision-Language-Action models that enable robotic control. The main objective of these methods is to develop a generalist…

Robotics · Computer Science 2024-06-25 Omkar Joglekar , Tal Lancewicki , Shir Kozlovsky , Vladimir Tchuiev , Zohar Feldman , Dotan Di Castro

Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they face significant challenges in embodied task planning scenarios that require continuous environmental understanding and action generation.…

Computation and Language · Computer Science 2025-07-01 Zhaoye Fei , Li Ji , Siyin Wang , Junhao Shi , Jingjing Gong , Xipeng Qiu

Pre-Trained Language Models for Interactive Decision-Making

Language model (LM) pre-training is useful in many language processing tasks. But can pre-trained LMs be further leveraged for more general machine learning problems? We propose an approach for using LMs to scaffold learning and…

Machine Learning · Computer Science 2022-11-01 Shuang Li , Xavier Puig , Chris Paxton , Yilun Du , Clinton Wang , Linxi Fan , Tao Chen , De-An Huang , Ekin Akyürek , Anima Anandkumar , Jacob Andreas , Igor Mordatch , Antonio Torralba , Yuke Zhu

Constrained Natural Language Action Planning for Resilient Embodied Systems

Replicating human-level intelligence in the execution of embodied tasks remains challenging due to the unconstrained nature of real-world environments. Novel use of large language models (LLMs) for task planning seeks to address the…

Robotics · Computer Science 2025-10-09 Grayson Byrd , Corban Rivera , Bethany Kemp , Meghan Booker , Aurora Schmidt , Celso M de Melo , Lalithkumar Seenivasan , Mathias Unberath