English

Class-Weighted Convolutional Features for Visual Instance Search

Computer Vision and Pattern Recognition 2017-07-11 v1 Information Retrieval Multimedia

Abstract

Image retrieval in realistic scenarios targets large dynamic datasets of unlabeled images. In these cases, training or fine-tuning a model every time new images are added to the database is neither efficient nor scalable. Convolutional neural networks trained for image classification over large datasets have been proven effective feature extractors for image retrieval. The most successful approaches are based on encoding the activations of convolutional layers, as they convey the image spatial information. In this paper, we go beyond this spatial information and propose a local-aware encoding of convolutional features based on semantic information predicted in the target image. To this end, we obtain the most discriminative regions of an image using Class Activation Maps (CAMs). CAMs are based on the knowledge contained in the network and therefore, our approach, has the additional advantage of not requiring external information. In addition, we use CAMs to generate object proposals during an unsupervised re-ranking stage after a first fast search. Our experiments on two public available datasets for instance retrieval, Oxford5k and Paris6k, demonstrate the competitiveness of our approach outperforming the current state-of-the-art when using off-the-shelf models trained on ImageNet. The source code and model used in this paper are publicly available at http://imatge-upc.github.io/retrieval-2017-cam/.

Keywords

Cite

@article{arxiv.1707.02581,
  title  = {Class-Weighted Convolutional Features for Visual Instance Search},
  author = {Albert Jimenez and Jose M. Alvarez and Xavier Giro-i-Nieto},
  journal= {arXiv preprint arXiv:1707.02581},
  year   = {2017}
}

Comments

To appear in the British Machine Vision Conference (BMVC), September 2017

R2 v1 2026-06-22T20:41:45.005Z