Related papers: Learning Manipulation by Predicting Interaction

Manipulate by Seeing: Creating Manipulation Controllers from Pre-Trained Representations

The field of visual representation learning has seen explosive growth in the past years, but its benefits in robotics have been surprisingly limited so far. Prior work uses generic visual representations as a basis to learn (task-specific)…

Robotics · Computer Science 2023-08-16 Jianren Wang , Sudeep Dasari , Mohan Kumar Srirama , Shubham Tulsiani , Abhinav Gupta

Proactive Human-Robot Interaction using Visuo-Lingual Transformers

Humans possess the innate ability to extract latent visuo-lingual cues to infer context through human interaction. During collaboration, this enables proactive prediction of the underlying intention of a series of tasks. In contrast,…

Robotics · Computer Science 2023-10-05 Pranay Mathur

Gripper Keypose and Object Pointflow as Interfaces for Bimanual Robotic Manipulation

Bimanual manipulation is a challenging yet crucial robotic capability, demanding precise spatial localization and versatile motion trajectories, which pose significant challenges to existing approaches. Existing approaches fall into two…

Robotics · Computer Science 2025-04-25 Yuyin Yang , Zetao Cai , Yang Tian , Jia Zeng , Jiangmiao Pang

Collaborative Motion Prediction via Neural Motion Message Passing

Motion prediction is essential and challenging for autonomous vehicles and social robots. One challenge of motion prediction is to model the interaction among traffic actors, which could cooperate with each other to avoid collisions or form…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Yue Hu , Siheng Chen , Ya Zhang , Xiao Gu

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets

The pre-training of visual representations has enhanced the efficiency of robot learning. Due to the lack of large-scale in-domain robotic datasets, prior works utilize in-the-wild human videos to pre-train robotic visual representation.…

Robotics · Computer Science 2024-10-31 Guangqi Jiang , Yifei Sun , Tao Huang , Huanyu Li , Yongyuan Liang , Huazhe Xu

VIP: Vision Instructed Pre-training for Robotic Manipulation

The effectiveness of scaling up training data in robotic manipulation is still limited. A primary challenge in manipulation is the tasks are diverse, and the trained policy would be confused if the task targets are not specified clearly.…

Robotics · Computer Science 2025-02-12 Zhuoling Li , Liangliang Ren , Jinrong Yang , Yong Zhao , Xiaoyang Wu , Zhenhua Xu , Xiang Bai , Hengshuang Zhao

Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods

Visual pre-training with large-scale real-world data has made great progress in recent years, showing great potential in robot learning with pixel observations. However, the recipes of visual pre-training for robot manipulation tasks are…

Robotics · Computer Science 2023-08-08 Ya Jing , Xuelin Zhu , Xingbin Liu , Qie Sima , Taozheng Yang , Yunhai Feng , Tao Kong

Active Perception and Representation for Robotic Manipulation

The vast majority of visual animals actively control their eyes, heads, and/or bodies to direct their gaze toward different parts of their environment. In contrast, recent applications of reinforcement learning in robotic manipulation…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Youssef Zaky , Gaurav Paruthi , Bryan Tripp , James Bergstra

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

Visual representations play a crucial role in developing generalist robotic policies. Previous vision encoders, typically pre-trained with single-image reconstruction or two-image contrastive learning, tend to capture static information,…

Computer Vision and Pattern Recognition · Computer Science 2025-05-06 Yucheng Hu , Yanjiang Guo , Pengchao Wang , Xiaoyu Chen , Yen-Jen Wang , Jianke Zhang , Koushil Sreenath , Chaochao Lu , Jianyu Chen

A System for Imitation Learning of Contact-Rich Bimanual Manipulation Policies

In this paper, we discuss a framework for teaching bimanual manipulation tasks by imitation. To this end, we present a system and algorithms for learning compliant and contact-rich robot behavior from human demonstrations. The presented…

Robotics · Computer Science 2022-08-02 Simon Stepputtis , Maryam Bandari , Stefan Schaal , Heni Ben Amor

Unsupervised Learning for Physical Interaction through Video Prediction

A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information.…

Machine Learning · Computer Science 2016-10-19 Chelsea Finn , Ian Goodfellow , Sergey Levine

Learning User Preferences via Reinforcement Learning with Spatial Interface Valuing

Interactive Machine Learning is concerned with creating systems that operate in environments alongside humans to achieve a task. A typical use is to extend or amplify the capabilities of a human in cognitive or physical ways, requiring the…

Machine Learning · Computer Science 2019-02-05 Miguel Alonso

Perceive, Interact, Predict: Learning Dynamic and Static Clues for End-to-End Motion Prediction

Motion prediction is highly relevant to the perception of dynamic objects and static map elements in the scenarios of autonomous driving. In this work, we propose PIP, the first end-to-end Transformer-based framework which jointly and…

Computer Vision and Pattern Recognition · Computer Science 2022-12-06 Bo Jiang , Shaoyu Chen , Xinggang Wang , Bencheng Liao , Tianheng Cheng , Jiajie Chen , Helong Zhou , Qian Zhang , Wenyu Liu , Chang Huang

What Makes Pre-Trained Visual Representations Successful for Robust Manipulation?

Inspired by the success of transfer learning in computer vision, roboticists have investigated visual pre-training as a means to improve the learning efficiency and generalization ability of policies learned from pixels. To that end, past…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Kaylee Burns , Zach Witzel , Jubayer Ibn Hamid , Tianhe Yu , Chelsea Finn , Karol Hausman

PIP: Physical Interaction Prediction via Mental Simulation with Span Selection

Accurate prediction of physical interaction outcomes is a crucial component of human intelligence and is important for safe and efficient deployments of robots in the real world. While there are existing vision-based intuitive physics…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Jiafei Duan , Samson Yu , Soujanya Poria , Bihan Wen , Cheston Tan

Probabilistic Multimodal Modeling for Human-Robot Interaction Tasks

Human-robot interaction benefits greatly from multimodal sensor inputs as they enable increased robustness and generalization accuracy. Despite this observation, few HRI methods are capable of efficiently performing inference for multimodal…

Robotics · Computer Science 2019-08-15 Joseph Campbell , Simon Stepputtis , Heni Ben Amor

Learning Predictive Models From Observation and Interaction

Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works, and then use this learned model to plan coordinated sequences of actions to bring about desired outcomes.…

Machine Learning · Computer Science 2020-01-01 Karl Schmeckpeper , Annie Xie , Oleh Rybkin , Stephen Tian , Kostas Daniilidis , Sergey Levine , Chelsea Finn

Learning robot motor skills with mixed reality

Mixed Reality (MR) has recently shown great success as an intuitive interface for enabling end-users to teach robots. Related works have used MR interfaces to communicate robot intents and beliefs to a co-located human, as well as developed…

Robotics · Computer Science 2022-03-23 Eric Rosen , Sreehari Rammohan , Devesh Jha

Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following

We consider the problem of learning to map from natural language instructions to state transitions (actions) in a data-efficient manner. Our method takes inspiration from the idea that it should be easier to ground language to concepts that…

Computation and Language · Computer Science 2019-07-24 David Gaddy , Dan Klein

Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning

Prompt-based learning has been demonstrated as a compelling paradigm contributing to large language models' tremendous success (LLMs). Inspired by their success in language tasks, existing research has leveraged LLMs in embodied instruction…

Robotics · Computer Science 2024-05-29 Jiachen Li , Qiaozi Gao , Michael Johnston , Xiaofeng Gao , Xuehai He , Suhaila Shakiah , Hangjie Shi , Reza Ghanadan , William Yang Wang