Related papers: Value Explicit Pretraining for Learning Transferab…

Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder

While Reinforcement Learning (RL) agents can successfully learn to handle complex tasks, effectively generalizing acquired skills to unfamiliar settings remains a challenge. One of the reasons behind this is the visual encoders used are…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Yuhan Zhang , Guoqing Ma , Guangfu Hao , Liangxuan Guo , Yang Chen , Shan Yu

VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

Reward and representation learning are two long-standing challenges for learning an expanding set of robot manipulation skills from sensory observations. Given the inherent cost and scarcity of in-domain, task-specific robot data, learning…

Robotics · Computer Science 2023-03-08 Yecheng Jason Ma , Shagun Sodhani , Dinesh Jayaraman , Osbert Bastani , Vikash Kumar , Amy Zhang

AVT: Unsupervised Learning of Transformation Equivariant Representations by Autoencoding Variational Transformations

The learning of Transformation-Equivariant Representations (TERs), which is introduced by Hinton et al. \cite{hinton2011transforming}, has been considered as a principle to reveal visual structures under various transformations. It contains…

Computer Vision and Pattern Recognition · Computer Science 2019-07-24 Guo-Jun Qi , Liheng Zhang , Chang Wen Chen , Qi Tian

Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning

The Vision Transformer architecture has shown to be competitive in the computer vision (CV) space where it has dethroned convolution-based networks in several benchmarks. Nevertheless, convolutional neural networks (CNN) remain the…

Machine Learning · Computer Science 2023-07-20 Manuel Goulão , Arlindo L. Oliveira

Position Prediction as an Effective Pretraining Strategy

Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing…

Machine Learning · Computer Science 2022-07-18 Shuangfei Zhai , Navdeep Jaitly , Jason Ramapuram , Dan Busbridge , Tatiana Likhomanenko , Joseph Yitan Cheng , Walter Talbott , Chen Huang , Hanlin Goh , Joshua Susskind

A Broad Study on the Transferability of Visual Representations with Contrastive Learning

Tremendous progress has been made in visual representation learning, notably with the recent success of self-supervised contrastive learning methods. Supervised contrastive learning has also been shown to outperform its cross-entropy…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Ashraful Islam , Chun-Fu Chen , Rameswar Panda , Leonid Karlinsky , Richard Radke , Rogerio Feris

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

Visual navigation is a task of training an embodied agent by intelligently navigating to a target object (e.g., television) using only visual observations. A key challenge for current deep reinforcement learning models lies in the…

Computer Vision and Pattern Recognition · Computer Science 2020-04-07 Juncheng Li , Xin Wang , Siliang Tang , Haizhou Shi , Fei Wu , Yueting Zhuang , William Yang Wang

Self-Supervised Learning via Maximum Entropy Coding

A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Xin Liu , Zhongdao Wang , Yali Li , Shengjin Wang

Rethinking Visual Prompt Learning as Masked Visual Token Modeling

Prompt learning has achieved great success in efficiently exploiting large-scale pre-trained models in natural language processing (NLP). It reformulates the downstream tasks as the generative pre-training ones to achieve consistency, thus…

Computer Vision and Pattern Recognition · Computer Science 2023-12-18 Ning Liao , Bowen Shi , Xiaopeng Zhang , Min Cao , Junchi Yan , Qi Tian

Self-supervised Pre-training of Text Recognizers

In this paper, we investigate self-supervised pre-training methods for document text recognition. Nowadays, large unlabeled datasets can be collected for many research tasks, including text recognition, but it is costly to annotate them.…

Computer Vision and Pattern Recognition · Computer Science 2024-05-02 Martin Kišš , Michal Hradiš

Multimodal Contrastive Training for Visual Representation Learning

We develop an approach to learning visual representations that embraces multimodal data, driven by a combination of intra- and inter-modal similarity preservation objectives. Unlike existing visual pre-training methods, which solve a proxy…

Computer Vision and Pattern Recognition · Computer Science 2021-04-28 Xin Yuan , Zhe Lin , Jason Kuen , Jianming Zhang , Yilin Wang , Michael Maire , Ajinkya Kale , Baldo Faieta

Pre-trained Visual Dynamics Representations for Efficient Policy Learning

Pre-training for Reinforcement Learning (RL) with purely video data is a valuable yet challenging problem. Although in-the-wild videos are readily available and inhere a vast amount of prior world knowledge, the absence of action…

Computer Vision and Pattern Recognition · Computer Science 2024-11-06 Hao Luo , Bohan Zhou , Zongqing Lu

The Surprising Effectiveness of Representation Learning for Visual Imitation

While visual imitation learning offers one of the most effective ways of learning from visual demonstrations, generalizing from them requires either hundreds of diverse demonstrations, task specific priors, or large, hard-to-train…

Robotics · Computer Science 2021-12-07 Jyothish Pari , Nur Muhammad Shafiullah , Sridhar Pandian Arunachalam , Lerrel Pinto

Video Prediction Models as Rewards for Reinforcement Learning

Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning. A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on…

Machine Learning · Computer Science 2023-05-31 Alejandro Escontrela , Ademi Adeniji , Wilson Yan , Ajay Jain , Xue Bin Peng , Ken Goldberg , Youngwoon Lee , Danijar Hafner , Pieter Abbeel

Learning Transferable Pedestrian Representation from Multimodal Information Supervision

Recent researches on unsupervised person re-identification~(reID) have demonstrated that pre-training on unlabeled person images achieves superior performance on downstream reID tasks than pre-training on ImageNet. However, those…

Computer Vision and Pattern Recognition · Computer Science 2023-04-13 Liping Bao , Longhui Wei , Xiaoyu Qiu , Wengang Zhou , Houqiang Li , Qi Tian

VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning

Procedural video representation learning is an active research area where the objective is to learn an agent which can anticipate and forecast the future given the present video input, typically in conjunction with textual annotations.…

Computer Vision and Pattern Recognition · Computer Science 2024-10-07 Han Lin , Tushar Nagarajan , Nicolas Ballas , Mido Assran , Mojtaba Komeili , Mohit Bansal , Koustuv Sinha

RePo: Resilient Model-Based Reinforcement Learning by Regularizing Posterior Predictability

Visual model-based RL methods typically encode image observations into low-dimensional representations in a manner that does not eliminate redundant information. This leaves them susceptible to spurious variations -- changes in…

Machine Learning · Computer Science 2023-10-26 Chuning Zhu , Max Simchowitz , Siri Gadipudi , Abhishek Gupta

Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning

Deep reinforcement learning (RL) algorithms suffer severe performance degradation when the interaction data is scarce, which limits their real-world application. Recently, visual representation learning has been shown to be effective and…

Machine Learning · Computer Science 2022-08-17 Yang Yue , Bingyi Kang , Zhongwen Xu , Gao Huang , Shuicheng Yan

Visual Pre-Training on Unlabeled Images using Reinforcement Learning

In reinforcement learning (RL), value-based algorithms learn to associate each observation with the states and rewards that are likely to be reached from it. We observe that many self-supervised image pre-training methods bear similarity to…

Machine Learning · Computer Science 2025-06-16 Dibya Ghosh , Sergey Levine

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

Learning to navigate in a visual environment following natural-language instructions is a challenging task, because the multimodal inputs to the agent are highly variable, and the training data on a new task is often limited. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2020-04-07 Weituo Hao , Chunyuan Li , Xiujun Li , Lawrence Carin , Jianfeng Gao