Related papers: Weakly Supervised Object Localization Using Things…
Weakly Supervised Object Localization (WSOL) methods only require image level labels as opposed to expensive bounding box annotations required by fully supervised algorithms. We study the problem of learning localization model on target…
Most existing approaches to training object detectors rely on fully supervised learning, which requires the tedious manual annotation of object location in a training set. Recently there has been an increasing interest in developing weakly…
In this paper, we propose an effective knowledge transfer framework to boost the weakly supervised object detection accuracy with the help of an external fully-annotated source dataset, whose categories may not overlap with the target…
Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised…
Recent advances of deep learning have achieved remarkable performances in various challenging computer vision tasks. Especially in object localization, deep convolutional neural networks outperform traditional approaches based on extraction…
Weakly supervised localization aims at finding target object regions using only image-level supervision. However, localization maps extracted from classification networks are often not accurate due to the lack of fine pixel-level…
We propose to revisit knowledge transfer for training object detectors on target classes from weakly supervised training images, helped by a set of source classes with bounding-box annotations. We present a unified knowledge transfer…
Attribute based knowledge transfer has proven very successful in visual object analysis and learning previously unseen classes. However, the common approach learns and transfers attributes without taking into consideration the embedded…
The status quo approach to training object detectors requires expensive bounding box annotations. Our framework takes a markedly different direction: we transfer tracked object boxes from weakly-labeled videos to weakly-labeled images to…
Weakly-supervised object localization methods tend to fail for object classes that consistently co-occur with the same background elements, e.g. trains on tracks. We propose a method to overcome these failures by adding a very small amount…
Conventional methods for object detection usually require substantial amounts of training data and annotated bounding boxes. If there are only a few training data and annotations, the object detectors easily overfit and fail to generalize.…
We introduce the problem of weakly supervised Multi-Object Tracking and Segmentation, i.e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation. To address it, we…
Object detection has achieved promising success, but requires large-scale fully-annotated data, which is time-consuming and labor-extensive. Therefore, we consider object detection with mixed supervision, which learns novel object…
Multisource image analysis that leverages complementary spectral, spatial, and structural information benefits fine-grained object recognition that aims to classify an object into one of many similar subcategories. However, for multisource…
Deep CNN-based object detection systems have achieved remarkable success on several large-scale object detection benchmarks. However, training such detectors requires a large number of labeled bounding boxes, which are more difficult to…
A critical object detection task is finetuning an existing model to detect novel objects, but the standard workflow requires bounding box annotations which are time-consuming and expensive to collect. Weakly supervised object detection…
Methods for object detection and segmentation rely on large scale instance-level annotations for training, which are difficult and time-consuming to collect. Efforts to alleviate this look at varying degrees and quality of supervision.…
When humans describe images they tend to use combinations of nouns and adjectives, corresponding to objects and their associated attributes respectively. To generate such a description automatically, one needs to model objects, attributes…
We propose to model complex visual scenes using a non-parametric Bayesian model learned from weakly labelled images abundant on media sharing sites such as Flickr. Given weak image-level annotations of objects and attributes without…
We consider the problem of unsupervised domain adaptation for semantic segmentation by easing the domain shift between the source domain (synthetic data) and the target domain (real data) in this work. State-of-the-art approaches prove that…