English
Related papers

Related papers: Language-Driven Representation Learning for Roboti…

200 papers

The field of visual representation learning has seen explosive growth in the past years, but its benefits in robotics have been surprisingly limited so far. Prior work uses generic visual representations as a basis to learn (task-specific)…

Robotics · Computer Science 2023-08-16 Jianren Wang , Sudeep Dasari , Mohan Kumar Srirama , Shubham Tulsiani , Abhinav Gupta

Visual model-based reinforcement learning (RL) has the potential to enable sample-efficient robot learning from visual observations. Yet the current approaches typically train a single model end-to-end for learning both visual…

Robotics · Computer Science 2023-05-30 Younggyo Seo , Danijar Hafner , Hao Liu , Fangchen Liu , Stephen James , Kimin Lee , Pieter Abbeel

While visual imitation learning offers one of the most effective ways of learning from visual demonstrations, generalizing from them requires either hundreds of diverse demonstrations, task specific priors, or large, hard-to-train…

Robotics · Computer Science 2021-12-07 Jyothish Pari , Nur Muhammad Shafiullah , Sridhar Pandian Arunachalam , Lerrel Pinto

Visual robotic manipulation research and applications often use multiple cameras, or views, to better perceive the world. How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good…

Robotics · Computer Science 2023-06-01 Younggyo Seo , Junsu Kim , Stephen James , Kimin Lee , Jinwoo Shin , Pieter Abbeel

A long-standing goal in robotics is to build robots that can perform a wide range of daily tasks from perceptions obtained with their onboard sensors and specified only via natural language. While recently substantial advances have been…

Robotics · Computer Science 2022-08-31 Oier Mees , Lukas Hermann , Wolfram Burgard

Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by…

In this work we propose a novel end-to-end imitation learning approach which combines natural language, vision, and motion information to produce an abstract representation of a task, which in turn is used to synthesize specific motion…

Robotics · Computer Science 2019-11-27 Simon Stepputtis , Joseph Campbell , Mariano Phielipp , Chitta Baral , Heni Ben Amor

Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks. We advocate that such a representation automatically arises from…

Video-based dialog task is a challenging multimodal learning task that has received increasing attention over the past few years with state-of-the-art obtaining new performance records. This progress is largely powered by the adaptation of…

Computer Vision and Pattern Recognition · Computer Science 2022-10-27 Huda Alamri , Anthony Bilic , Michael Hu , Apoorva Beedu , Irfan Essa

In this work, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. Like prior work, our visual representations are pre-trained via a masked autoencoder (MAE), frozen, and…

Robotics · Computer Science 2022-10-07 Ilija Radosavovic , Tete Xiao , Stephen James , Pieter Abbeel , Jitendra Malik , Trevor Darrell

We propose a visual-linguistic representation learning approach within a self-supervised learning framework by introducing a new operation, loss, and data augmentation strategy. First, we generate diverse features for the image-text…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Jaeyoo Park , Bohyung Han

Visual-textual understanding is essential for language-guided robot manipulation. Recent works leverage pre-trained vision-language models to measure the similarity between encoded visual observations and textual instructions, and then…

Robotics · Computer Science 2025-09-30 Chaoran Zhu , Hengyi Wang , Yik Lung Pang , Changjae Oh

In recent years, a number of models that learn the relations between vision and language from large datasets have been released. These models perform a variety of tasks, such as answering questions about images, retrieving sentences that…

Robotics · Computer Science 2024-03-19 Kento Kawaharazuka , Yoshiki Obinata , Naoaki Kanazawa , Kei Okada , Masayuki Inaba

Humans learn powerful representations of objects and scenes by observing how they evolve over time. Yet, outside of specific tasks that require explicit temporal understanding, static image pretraining remains the dominant paradigm for…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Nikhil Parthasarathy , S. M. Ali Eslami , João Carreira , Olivier J. Hénaff

It is challenging for humans -- particularly those living with physical disabilities -- to control high-dimensional, dexterous robots. Prior work explores learning embedding functions that map a human's low-dimensional inputs (e.g., via a…

Robotics · Computer Science 2021-05-04 Siddharth Karamcheti , Albert J. Zhai , Dylan P. Losey , Dorsa Sadigh

Multimodality Representation Learning, as a technique of learning to embed information from different modalities and their correlations, has achieved remarkable success on a variety of applications, such as Visual Question Answering (VQA),…

Artificial Intelligence · Computer Science 2024-03-04 Muhammad Arslan Manzoor , Sarah Albarri , Ziting Xian , Zaiqiao Meng , Preslav Nakov , Shangsong Liang

Based on the recent advancements in representation learning, we propose a novel pipeline for task-oriented voice-controlled robots with raw sensor inputs. Previous methods rely on a large number of labels and task-specific reward functions.…

Robotics · Computer Science 2023-03-07 Peixin Chang , Shuijing Liu , D. Livingston McPherson , Katherine Driggs-Campbell

Learning effective visual representations for robotic manipulation remains a fundamental challenge due to the complex body dynamics involved in action execution. In this paper, we study how visual representations that carry body-relevant…

Robotics · Computer Science 2026-02-17 Junlin Wang , Zhiyun Lin

Recent works have shown that Large Language Models (LLMs) can be applied to ground natural language to a wide variety of robot skills. However, in practice, learning multi-task, language-conditioned robotic skills typically requires…

Robotics · Computer Science 2023-03-09 Oier Mees , Jessica Borja-Diaz , Wolfram Burgard

The pre-training of visual representations has enhanced the efficiency of robot learning. Due to the lack of large-scale in-domain robotic datasets, prior works utilize in-the-wild human videos to pre-train robotic visual representation.…

Robotics · Computer Science 2024-10-31 Guangqi Jiang , Yifei Sun , Tao Huang , Huanyu Li , Yongyuan Liang , Huazhe Xu
‹ Prev 1 2 3 10 Next ›