Related papers: Matching Multiple Perspectives for Efficient Repre…
It is known that representations from self-supervised pre-training can perform on par, and often better, on various downstream tasks than representations from fully-supervised pre-training. This has been shown in a host of settings such as…
Research in child development has shown that embodied experience handling physical objects contributes to many cognitive abilities, including visual learning. One characteristic of such experience is that the learner sees the same object…
In this work, we study different approaches to self-supervised pretraining of object detection models. We first design a general framework to learn a spatially consistent dense representation from an image, by randomly sampling and…
As a subset of unsupervised representation learning, self-supervised representation learning adopts self-defined signals as supervision and uses the learned representation for downstream tasks, such as object detection and image captioning.…
We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context. For example, one could produce multiple views of a local…
Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the…
Self-supervision allows learning meaningful representations of natural images, which usually contain one central object. How well does it transfer to multi-entity scenes? We discuss key aspects of learning structured object-centric…
Well structured visual representations can make robot learning faster and can improve generalization. In this paper, we study how we can acquire effective object-centric representations for robotic manipulation tasks without human labeling…
Self-supervised learning has become a popular approach in recent years for its ability to learn meaningful representations without the need for data annotation. This paper proposes a novel image augmentation technique, overlaying images,…
Self-supervision has emerged as a propitious method for visual representation learning after the recent paradigm shift from handcrafted pretext tasks to instance-similarity based approaches. Most state-of-the-art methods enforce similarity…
In robotic applications, we often face the challenge of discovering new objects while having very little or no labelled training data. In this paper we explore the use of self-supervision provided by a robot traversing an environment to…
Visual place recognition is a key to unlocking spatial navigation for animals, humans and robots. While state-of-the-art approaches are trained in a supervised manner and therefore hardly capture the information needed for generalizing to…
Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even…
Progress in self-supervised learning has brought strong general image representation learning methods. Yet so far, it has mostly focused on image-level learning. In turn, tasks such as unsupervised image segmentation have not benefited from…
Despite their irresistible success, deep learning algorithms still heavily rely on annotated data. On the other hand, unsupervised settings pose many challenges, especially about determining the right inductive bias in diverse scenarios.…
Learning meaningful representations is at the heart of many tasks in the field of modern machine learning. Recently, a lot of methods were introduced that allow learning of image representations without supervision. These representations…
The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an…
The generalization ability of imitation learning policies for robotic manipulation is fundamentally constrained by the diversity of expert demonstrations, while collecting demonstrations across varied environments is costly and difficult in…
Self-supervised representation learning is able to learn semantically meaningful features; however, much of its recent success relies on multiple crops of an image with very few objects. Instead of learning view-invariant representation…
3D object representation learning is a fundamental challenge in computer vision to infer about the 3D world. Recent advances in deep learning have shown their efficiency in 3D object recognition, among which view-based methods have…