English
Related papers

Related papers: Programmatically Grounded, Compositionally General…

200 papers

Vision-language-action (VLA) models finetuned from vision-language models (VLMs) hold the promise of leveraging rich pretrained representations to build generalist robots across diverse tasks and environments. However, direct fine-tuning on…

Robotics · Computer Science 2025-09-18 Shresth Grover , Akshay Gopalkrishnan , Bo Ai , Henrik I. Christensen , Hao Su , Xuanlin Li

Given a natural language instruction and an input scene, our goal is to train a model to output a manipulation program that can be executed by the robot. Prior approaches for this task possess one of the following limitations: (i) rely on…

Robotic manipulation faces a significant challenge in generalizing across unseen objects, environments and tasks specified by diverse language instructions. To improve generalization capabilities, recent research has incorporated large…

Robotics · Computer Science 2025-06-16 Shizhe Chen , Ricardo Garcia , Paul Pacaud , Cordelia Schmid

How can we imbue robots with the ability to manipulate objects precisely but also to reason about them in terms of abstract concepts? Recent works in manipulation have shown that end-to-end networks can learn dexterous skills that require…

Robotics · Computer Science 2021-09-27 Mohit Shridhar , Lucas Manuelli , Dieter Fox

Understanding human instructions and accomplishing Vision-Language Navigation tasks in unknown environments is essential for robots. However, existing modular approaches heavily rely on the quality of training data and often exhibit poor…

Robotics · Computer Science 2025-09-30 Yao Wang , Zhirui Sun , Wenzheng Chi , Baozhi Jia , Wenjun Xu , Jiankun Wang

Vision-Language Model (VLM) is an important component to enable robust robot manipulation. Yet, using it to translate human instructions into an action-resolvable intermediate representation often needs a tradeoff between…

Pretrained vision-language models (VLMs) can make semantic and visual inferences across diverse settings, providing valuable common-sense priors for robotic control. However, effectively grounding this knowledge in robot behaviors remains…

Language model (LM) pre-training is useful in many language processing tasks. But can pre-trained LMs be further leveraged for more general machine learning problems? We propose an approach for using LMs to scaffold learning and…

Recent advances in legged locomotion learning are still dominated by the utilization of geometric representations of the environment, limiting the robot's capability to respond to higher-level semantics such as human instructions. To…

Robotics · Computer Science 2026-02-12 I Made Aswin Nahrendra , Seunghyun Lee , Dongkyu Lee , Hyun Myung

The control of robots for manipulation tasks generally relies on visual input. Recent advances in vision-language models (VLMs) enable the use of natural language instructions to condition visual input and control robots in a wider range of…

Robotics · Computer Science 2025-08-05 Chenglin Cui , Chaoran Zhu , Changjae Oh , Andrea Cavallaro

Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize across a wide spectrum of behaviors, enabling a single policy to act in varied real-world environments. However, they still fall…

Robotics · Computer Science 2026-03-03 Yajat Yadav , Zhiyuan Zhou , Andrew Wagenmaker , Karl Pertsch , Sergey Levine

Generalization to unseen real-world scenarios for robot manipulation requires exposure to diverse datasets during training. However, collecting large real-world datasets is intractable due to high operational costs. For robot learning to…

Robotics · Computer Science 2024-09-04 Zoey Chen , Zhao Mandi , Homanga Bharadhwaj , Mohit Sharma , Shuran Song , Abhishek Gupta , Vikash Kumar

Large Language Models (LLMs) and strong vision models have enabled rapid research and development in the field of Vision-Language-Action models that enable robotic control. The main objective of these methods is to develop a generalist…

Foundation models have received much attention due to their effectiveness across a broad range of downstream applications. Though there is a big convergence in terms of architecture, most pretrained models are typically still developed for…

Computation and Language · Computer Science 2022-06-14 Yaru Hao , Haoyu Song , Li Dong , Shaohan Huang , Zewen Chi , Wenhui Wang , Shuming Ma , Furu Wei

Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by…

Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to…

Robotics · Computer Science 2023-11-03 Wenlong Huang , Chen Wang , Ruohan Zhang , Yunzhu Li , Jiajun Wu , Li Fei-Fei

General-purpose robotic manipulation, including reach and grasp, is essential for deployment into households and workspaces involving diverse and evolving tasks. Recent advances propose using large pre-trained models, such as Large Language…

Robotics · Computer Science 2025-07-16 Huiyi Wang , Fahim Shahriar , Alireza Azimi , Gautham Vasan , Rupam Mahmood , Colin Bellinger

Grounding the common-sense reasoning of Large Language Models (LLMs) in physical domains remains a pivotal yet unsolved problem for embodied AI. Whereas prior works have focused on leveraging LLMs directly for planning in symbolic spaces,…

Robotics · Computer Science 2024-12-10 Yanwei Wang , Tsun-Hsuan Wang , Jiayuan Mao , Michael Hagenow , Julie Shah

Foundation models (FMs) are increasingly used to bridge language and action in embodied agents, yet the operational characteristics of different FM integration strategies remain under-explored -- particularly for complex instruction…

Robotics · Computer Science 2025-11-04 Xiuchao Sui , Daiying Tian , Qi Sun , Ruirui Chen , Dongkyu Choi , Kenneth Kwok , Soujanya Poria

Recent advancements in robotic manipulation have highlighted the potential of intermediate representations for improving policy generalization. In this work, we explore grounding masks as an effective intermediate representation, balancing…

Robotics · Computer Science 2025-05-01 Haifeng Huang , Xinyi Chen , Yilun Chen , Hao Li , Xiaoshen Han , Zehan Wang , Tai Wang , Jiangmiao Pang , Zhou Zhao
‹ Prev 1 2 3 10 Next ›