Related papers: Visual Recognition by Request

What Is Considered Complete for Visual Recognition?

This is an opinion paper. We hope to deliver a key message that current visual recognition systems are far from complete, i.e., recognizing everything that human can recognize, yet it is very unlikely that the gap can be bridged by…

Computer Vision and Pattern Recognition · Computer Science 2021-05-31 Lingxi Xie , Xiaopeng Zhang , Longhui Wei , Jianlong Chang , Qi Tian

Exploiting Visual Semantic Reasoning for Video-Text Retrieval

Video retrieval is a challenging research topic bridging the vision and language areas and has attracted broad attention in recent years. Previous works have been devoted to representing videos by directly encoding from frame-level…

Computer Vision and Pattern Recognition · Computer Science 2020-06-17 Zerun Feng , Zhimin Zeng , Caili Guo , Zheng Li

Learning Semantics for Visual Place Recognition through Multi-Scale Attention

In this paper we address the task of visual place recognition (VPR), where the goal is to retrieve the correct GPS coordinates of a given query image against a huge geotagged gallery. While recent works have shown that building descriptors…

Computer Vision and Pattern Recognition · Computer Science 2022-01-26 Valerio Paolicelli , Antonio Tavera , Carlo Masone , Gabriele Berton , Barbara Caputo

Delving Deeper: Hierarchical Visual Perception for Robust Video-Text Retrieval

Video-text retrieval (VTR) aims to locate relevant videos using natural language queries. Current methods, often based on pre-trained models like CLIP, are hindered by video's inherent redundancy and their reliance on coarse, final-layer…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Zequn Xie , Boyun Zhang , Yuxiao Lin , Tao Jin

Revisit Anything: Visual Place Recognition via Image Segment Retrieval

Accurately recognizing a revisited place is crucial for embodied agents to localize and navigate. This requires visual representations to be distinct, despite strong variations in camera viewpoint and scene appearance. Existing visual place…

Computer Vision and Pattern Recognition · Computer Science 2024-09-27 Kartik Garg , Sai Shubodh Puligilla , Shishir Kolathaya , Madhava Krishna , Sourav Garg

Are Local Features All You Need for Cross-Domain Visual Place Recognition?

Visual Place Recognition is a task that aims to predict the coordinates of an image (called query) based solely on visual clues. Most commonly, a retrieval approach is adopted, where the query is matched to the most similar images from a…

Computer Vision and Pattern Recognition · Computer Science 2023-04-13 Giovanni Barbarani , Mohamad Mostafa , Hajali Bayramov , Gabriele Trivigno , Gabriele Berton , Carlo Masone , Barbara Caputo

From Recognition to Cognition: Visual Commonsense Reasoning

Visual understanding goes well beyond object recognition. With one glance at an image, we can effortlessly imagine the world beyond the pixels: for instance, we can infer people's actions, goals, and mental states. While this task is easy…

Computer Vision and Pattern Recognition · Computer Science 2019-03-27 Rowan Zellers , Yonatan Bisk , Ali Farhadi , Yejin Choi

VirTex: Learning Visual Representations from Textual Annotations

The de-facto approach to many vision tasks is to start from pretrained visual representations, typically learned via supervised training on ImageNet. Recent methods have explored unsupervised pretraining to scale to vast quantities of…

Computer Vision and Pattern Recognition · Computer Science 2021-09-28 Karan Desai , Justin Johnson

Beyond Embeddings: The Promise of Visual Table in Visual Reasoning

Visual representation learning has been a cornerstone in computer vision, involving typical forms such as visual embeddings, structural symbols, and text-based representations. Despite the success of CLIP-type visual embeddings, they often…

Computer Vision and Pattern Recognition · Computer Science 2024-06-18 Yiwu Zhong , Zi-Yuan Hu , Michael R. Lyu , Liwei Wang

Context-Based Visual-Language Place Recognition

In vision-based robot localization and SLAM, Visual Place Recognition (VPR) is essential. This paper addresses the problem of VPR, which involves accurately recognizing the location corresponding to a given query image. A popular approach…

Robotics · Computer Science 2024-10-28 Soojin Woo , Seong-Woo Kim

Generic decoding of seen and imagined objects using hierarchical visual features

Object recognition is a key function in both human and machine vision. While recent studies have achieved fMRI decoding of seen and imagined contents, the prediction is limited to training examples. We present a decoding approach for…

Neurons and Cognition · Quantitative Biology 2016-09-28 Tomoyasu Horikawa , Yukiyasu Kamitani

VQD: Visual Query Detection in Natural Scenes

We propose Visual Query Detection (VQD), a new visual grounding task. In VQD, a system is guided by natural language to localize a variable number of objects in an image. VQD is related to visual referring expression recognition, where the…

Computer Vision and Pattern Recognition · Computer Science 2019-04-15 Manoj Acharya , Karan Jariwala , Christopher Kanan

VisRet: Visualization Improves Knowledge-Intensive Text-to-Image Retrieval

Text-to-image retrieval (T2I retrieval) remains challenging because cross-modal embeddings often behave as bags of concepts, underrepresenting structured visual relationships such as pose and viewpoint. We proposeVisualize-then-Retrieve…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Di Wu , Yixin Wan , Kai-Wei Chang

ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy

Visual relations form the basis of understanding our compositional world, as relationships between visual objects capture key information in a scene. It is then advantageous to learn relations automatically from the data, as learning with…

Computer Vision and Pattern Recognition · Computer Science 2022-07-05 Daniel Zeng , Tailin Wu , Jure Leskovec

QualiRAG: Retrieval-Augmented Generation for Visual Quality Understanding

Visual quality assessment (VQA) is increasingly shifting from scalar score prediction toward interpretable quality understanding -- a paradigm that demands \textit{fine-grained spatiotemporal perception} and \textit{auxiliary contextual…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Linhan Cao , Wei Sun , Weixia Zhang , Xiangyang Zhu , Kaiwei Zhang , Jun Jia , Dandan Zhu , Guangtao Zhai , Xiongkuo Min

VizRec: A framework for secure data exploration via visual representation

Visual representations of data (visualizations) are tools of great importance and widespread use in data analytics as they provide users visual insight to patterns in the observed data in a simple and effective way. However, since…

Databases · Computer Science 2018-11-05 Lorenzo De Stefani , Leonhard F. Spiegelberg , Tim Kraska , Eli Upfal

A Benchmark for Compositional Visual Reasoning

A fundamental component of human vision is our ability to parse complex visual scenes and judge the relations between their constituent objects. AI benchmarks for visual reasoning have driven rapid progress in recent years with…

Computer Vision and Pattern Recognition · Computer Science 2022-06-14 Aimen Zerroug , Mohit Vaishnav , Julien Colin , Sebastian Musslick , Thomas Serre

Visual Probing and Correction of Object Recognition Models with Interactive user feedback

With the advent of state-of-the-art machine learning and deep learning technologies, several industries are moving towards the field. Applications of such technologies are highly diverse ranging from natural language processing to computer…

Computer Vision and Pattern Recognition · Computer Science 2021-01-01 Viny Saajan Victor , Pramod Vadiraja , Jan-Tobias Sohns , Heike Leitte

Coarse-to-Fine Reasoning for Visual Question Answering

Bridging the semantic gap between image and question is an important step to improve the accuracy of the Visual Question Answering (VQA) task. However, most of the existing VQA methods focus on attention mechanisms or visual relations for…

Computer Vision and Pattern Recognition · Computer Science 2022-04-20 Binh X. Nguyen , Tuong Do , Huy Tran , Erman Tjiputra , Quang D. Tran , Anh Nguyen

Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models

Pretrained vision-language models, such as CLIP, have demonstrated strong generalization capabilities, making them promising tools in the realm of zero-shot visual recognition. Visual relation detection (VRD) is a typical task that…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Lin Li , Jun Xiao , Guikun Chen , Jian Shao , Yueting Zhuang , Long Chen