Related papers: ICAR: Image-based Complementary Auto Reasoning

FAR-Net: Multi-Stage Fusion Network with Enhanced Semantic Alignment and Adaptive Reconciliation for Composed Image Retrieval

Composed image retrieval (CIR) is a vision language task that retrieves a target image using a reference image and modification text, enabling intuitive specification of desired changes. While effectively fusing visual and textual…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Jeong-Woo Park , Young-Eun Kim , Seong-Whan Lee

Image Complexity-Aware Adaptive Retrieval for Efficient Vision-Language Models

Vision transformers in vision-language models typically use the same amount of compute for every image, regardless of whether it is simple or complex. We propose ICAR (Image Complexity-Aware Retrieval), an adaptive computation approach that…

Information Retrieval · Computer Science 2026-01-16 Mikel Williams-Lekuona , Georgina Cosma

FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data

Composed Image Retrieval (CIR) retrieves target images using a reference image paired with modification text. Despite rapid advances, all existing methods and datasets operate at the image level -- a single reference image plus modification…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Peng Yuan , Bingyin Mei , Hui Zhang

FBCIR: Balancing Cross-Modal Focuses in Composed Image Retrieval

Composed image retrieval (CIR) requires multi-modal models to jointly reason over visual content and semantic modifications presented in text-image input pairs. While current CIR models achieve strong performance on common benchmark cases,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Chenchen Zhao , Jianhuan Zhuo , Muxi Chen , Zhaohua Zhang , Wenyu Jiang , Tianwen Jiang , Qiuyong Xiao , Jihong Zhang , Qiang Xu

Forward Compatible Training for Large-Scale Embedding Retrieval Systems

In visual retrieval systems, updating the embedding model requires recomputing features for every piece of data. This expensive process is referred to as backfilling. Recently, the idea of backward compatible training (BCT) was proposed. To…

Computer Vision and Pattern Recognition · Computer Science 2022-03-31 Vivek Ramanujan , Pavan Kumar Anasosalu Vasu , Ali Farhadi , Oncel Tuzel , Hadi Pouransari

FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval

Composed image retrieval (CIR) aims to retrieve a target image that depicts a reference image modified by a textual description. While recent vision-language models (VLMs) achieve promising CIR performance by embedding images and text into…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 François Gardères , Camille-Sovanneary Gauthier , Jean Ponce , Shizhe Chen

iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition

Image classification, which classifies images by pre-defined categories, has been the dominant approach to visual representation learning over the last decade. Visual learning through image-text alignment, however, has emerged to show…

Computer Vision and Pattern Recognition · Computer Science 2022-04-25 Yixuan Wei , Yue Cao , Zheng Zhang , Zhuliang Yao , Zhenda Xie , Han Hu , Baining Guo

Quantum Inverse Contextual Vision Transformers (Q-ICVT): A New Frontier in 3D Object Detection for AVs

The field of autonomous vehicles (AVs) predominantly leverages multi-modal integration of LiDAR and camera data to achieve better performance compared to using a single modality. However, the fusion process encounters challenges in…

Computer Vision and Pattern Recognition · Computer Science 2024-08-22 Sanjay Bhargav Dharavath , Tanmoy Dam , Supriyo Chakraborty , Prithwiraj Roy , Aniruddha Maiti

IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving in recent years. However, current collaborative perception predominantly relies on LiDAR point clouds, with significantly…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Shaohong Wang , Lu Bin , Xinyu Xiao , Zhiyu Xiang , Hangguan Shan , Eryun Liu

A Competence-aware Curriculum for Visual Concepts Learning via Question Answering

Humans can progressively learn visual concepts from easy to hard questions. To mimic this efficient learning ability, we propose a competence-aware curriculum for visual concept learning in a question-answering manner. Specifically, we…

Computer Vision and Pattern Recognition · Computer Science 2020-07-29 Qing Li , Siyuan Huang , Yining Hong , Song-Chun Zhu

Generative Editing in the Joint Vision-Language Space for Zero-Shot Composed Image Retrieval

Composed Image Retrieval (CIR) enables fine-grained visual search by combining a reference image with a textual modification. While supervised CIR methods achieve high accuracy, their reliance on costly triplet annotations motivates…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Xin Wang , Haipeng Zhang , Mang Li , Zhaohui Xia , Yueguo Chen , Yu Zhang , Chunyu Wei

FusionBERT: Multi-View Image-3D Retrieval via Cross-Attention Visual Fusion and Normal-Aware 3D Encoder

We propose FusionBERT, a novel multi-view visual fusion framework for image-3D multimodal retrieval. Existing image-3D representation learning methods predominantly focus on feature alignment of a single object image and its 3D model,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Wei Li , Yufan Ren , Hanqing Jiang , Jianhui Ding , Zhen Peng , Leman Feng , Yichun Shentu , Guoqiang Xu , Baigui Sun

Pseudo-triplet Guided Few-shot Composed Image Retrieval

Composed Image Retrieval (CIR) is a challenging task that aims to retrieve the target image with a multimodal query, i.e., a reference image, and its complementary modification text. As previous supervised or zero-shot learning paradigms…

Computer Vision and Pattern Recognition · Computer Science 2024-11-13 Bohan Hou , Haoqiang Lin , Haokun Wen , Meng Liu , Mingzhu Xu , Xuemeng Song

A Hybrid Multimodal Deep Learning Framework for Intelligent Fashion Recommendation

The rapid expansion of online fashion platforms has created an increasing demand for intelligent recommender systems capable of understanding both visual and textual cues. This paper proposes a hybrid multimodal deep learning framework for…

Information Retrieval · Computer Science 2025-11-20 Kamand Kalashi , Babak Teimourpour

A Sanity Check on Composed Image Retrieval

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Yikun Liu , Jiangchao Yao , Weidi Xie , Yanfeng Wang

VICTOR: Visual Incompatibility Detection with Transformers and Fashion-specific contrastive pre-training

For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary…

Computer Vision and Pattern Recognition · Computer Science 2022-09-09 Stefanos-Iordanis Papadopoulos , Christos Koutlis , Symeon Papadopoulos , Ioannis Kompatsiaris

Brain-like Flexible Visual Inference by Harnessing Feedback-Feedforward Alignment

In natural vision, feedback connections support versatile visual inference capabilities such as making sense of the occluded or noisy bottom-up sensory information or mediating pure top-down processes such as imagination. However, the…

Neurons and Cognition · Quantitative Biology 2023-11-01 Tahereh Toosi , Elias B. Issa

Towards Backward-Compatible Representation Learning

We propose a way to learn visual features that are compatible with previously computed ones even when they have different dimensions and are learned via different neural network architectures and loss functions. Compatible means that, if…

Computer Vision and Pattern Recognition · Computer Science 2021-01-07 Yantao Shen , Yuanjun Xiong , Wei Xia , Stefano Soatto

Instance-Level Composed Image Retrieval

The progress of composed image retrieval (CIR), a popular research direction in image retrieval, where a combined visual and textual query is used, is held back by the absence of high-quality training and evaluation data. We introduce a new…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Bill Psomas , George Retsinas , Nikos Efthymiadis , Panagiotis Filntisis , Yannis Avrithis , Petros Maragos , Ondrej Chum , Giorgos Tolias

CIR-CoT: Towards Interpretable Composed Image Retrieval via End-to-End Chain-of-Thought Reasoning

Composed Image Retrieval (CIR), which aims to find a target image from a reference image and a modification text, presents the core challenge of performing unified reasoning across visual and semantic modalities. While current approaches…

Computer Vision and Pattern Recognition · Computer Science 2025-10-10 Weihuang Lin , Yiwei Ma , Jiayi Ji , Xiaoshuai Sun , Rongrong Ji