Related papers: Multimodal Query-guided Object Localization

Sketch-Guided Object Localization in Natural Images

We introduce the novel problem of localizing all the instances of an object (seen or unseen during training) in a natural image via sketch query. We refer to this problem as sketch-guided object localization. This problem is distinctively…

Computer Vision and Pattern Recognition · Computer Science 2020-08-18 Aditay Tripathi , Rajath R Dani , Anand Mishra , Anirban Chakraborty

Query-guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch

In this work, we investigate the problem of sketch-based object localization on natural images, where given a crude hand-drawn sketch of an object, the goal is to localize all the instances of the same object on the target image. This…

Computer Vision and Pattern Recognition · Computer Science 2023-03-16 Aditay Tripathi , Anand Mishra , Anirban Chakraborty

Localizing Infinity-shaped fishes: Sketch-guided object localization in the wild

This work investigates the problem of sketch-guided object localization (SGOL), where human sketches are used as queries to conduct the object localization in natural images. In this cross-modal setting, we first contribute with a…

Computer Vision and Pattern Recognition · Computer Science 2021-09-27 Pau Riba , Sounak Dey , Ali Furkan Biten , Josep Llados

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities…

Computer Vision and Pattern Recognition · Computer Science 2018-05-01 Sounak Dey , Anjan Dutta , Suman K. Ghosh , Ernest Valveny , Josep Lladós , Umapada Pal

Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval

Most existing image retrieval systems use text queries as a way for the user to express what they are looking for. However, fine-grained image retrieval often requires the ability to also express where in the image the content they are…

Computer Vision and Pattern Recognition · Computer Science 2021-08-26 Soravit Changpinyo , Jordi Pont-Tuset , Vittorio Ferrari , Radu Soricut

Sketch-based Video Object Localization

We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch. We first outline the challenges in the SVOL task and build the Sketch-Video…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Sangmin Woo , So-Yeong Jeon , Jinyoung Park , Minji Son , Sumin Lee , Changick Kim

Let Human Sketches Help: Empowering Challenging Image Segmentation Task with Freehand Sketches

Sketches, with their expressive potential, allow humans to convey the essence of an object through even a rough contour. For the first time, we harness this expressive potential to improve segmentation performance in challenging tasks like…

Computer Vision and Pattern Recognition · Computer Science 2025-02-03 Ying Zang , Runlong Cao , Jianqi Zhang , Yidong Han , Ziyue Cao , Wenjun Hu , Didi Zhu , Lanyun Zhu , Zejian Li , Deyi Ji , Tianrun Chen

Towards Accurate Localization by Instance Search

Visual object localization is the key step in a series of object detection tasks. In the literature, high localization accuracy is achieved with the mainstream strongly supervised frameworks. However, such methods require object-level…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Yi-Geng Hong , Hui-Chu Xiao , Wan-Lei Zhao

Multimodal Referring Segmentation: A Survey

Multimodal referring segmentation aims to segment target objects in visual scenes, such as images, videos, and 3D scenes, based on referring expressions in text or audio format. This task plays a crucial role in practical applications…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Henghui Ding , Song Tang , Shuting He , Chang Liu , Zuxuan Wu , Yu-Gang Jiang

Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection

Keypoint detection, integral to modern machine perception, faces challenges in few-shot learning, particularly when source data from the same distribution as the query is unavailable. This gap is addressed by leveraging sketches, a popular…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Subhajit Maity , Ayan Kumar Bhunia , Subhadeep Koley , Pinaki Nath Chowdhury , Aneeshan Sain , Yi-Zhe Song

xMOD: Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D motion

Object discovery, which refers to the task of localizing objects without human annotations, has gained significant attention in 2D image analysis. However, despite this growing interest, it remains under-explored in 3D data, where…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Saad Lahlali , Sandra Kara , Hejer Ammar , Florian Chabot , Nicolas Granger , Hervé Le Borgne , Quoc-Cuong Pham

SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-15 Yang Miao , Francis Engelmann , Olga Vysotska , Federico Tombari , Marc Pollefeys , Dániel Béla Baráth

I Know What You Draw: Learning Grasp Detection Conditioned on a Few Freehand Sketches

In this paper, we are interested in the problem of generating target grasps by understanding freehand sketches. The sketch is useful for the persons who cannot formulate language and the cases where a textual description is not available on…

Robotics · Computer Science 2022-05-10 Haitao Lin , Chilam Cheang , Yanwei Fu , Xiangyang Xue

Object category understanding via eye fixations on freehand sketches

The study of eye gaze fixations on photographic images is an active research area. In contrast, the image subcategory of freehand sketches has not received as much attention for such studies. In this paper, we analyze the results of a…

Computer Vision and Pattern Recognition · Computer Science 2017-08-09 Ravi Kiran Sarvadevabhatla , Sudharshan Suresh , R. Venkatesh Babu

Towards Open-Set Object Detection and Discovery

With the human pursuit of knowledge, open-set object detection (OSOD) has been designed to identify unknown objects in a dynamic world. However, an issue with the current setting is that all the predicted unknown objects share the same…

Computer Vision and Pattern Recognition · Computer Science 2022-04-13 Jiyang Zheng , Weihao Li , Jie Hong , Lars Petersson , Nick Barnes

Sketch-based Image Retrieval from Millions of Images under Rotation, Translation and Scale Variations

Proliferation of touch-based devices has made sketch-based image retrieval practical. While many methods exist for sketch-based object detection/image retrieval on small datasets, relatively less work has been done on large (web)-scale…

Computer Vision and Pattern Recognition · Computer Science 2015-11-03 Sarthak Parui , Anurag Mittal

Fast Object Localization Using a CNN Feature Map Based Multi-Scale Search

Object localization is an important task in computer vision but requires a large amount of computational power due mainly to an exhaustive multiscale search on the input image. In this paper, we describe a near real-time multiscale search…

Computer Vision and Pattern Recognition · Computer Science 2016-04-14 Hyungtae Lee , Heesung Kwon , Archith J. Bency , William D. Nothwang

Object as Query: Lifting any 2D Object Detector to 3D Detection

3D object detection from multi-view images has drawn much attention over the past few years. Existing methods mainly establish 3D representations from multi-view images and adopt a dense detection head for object detection, or employ object…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Zitian Wang , Zehao Huang , Jiahui Fu , Naiyan Wang , Si Liu

Few-shot Object Localization

Existing object localization methods are tailored to locate specific classes of objects, relying heavily on abundant labeled data for model optimization. However, acquiring large amounts of labeled data is challenging in many real-world…

Computer Vision and Pattern Recognition · Computer Science 2024-06-06 Yunhan Ren , Bo Li , Chengyang Zhang , Yong Zhang , Baocai Yin

Localizing Object-level Shape Variations with Text-to-Image Diffusion Models

Text-to-image models give rise to workflows which often begin with an exploration step, where users sift through a large collection of generated images. The global nature of the text-to-image generation process prevents users from narrowing…

Computer Vision and Pattern Recognition · Computer Science 2023-08-15 Or Patashnik , Daniel Garibi , Idan Azuri , Hadar Averbuch-Elor , Daniel Cohen-Or