Related papers: Instance-Level Composed Image Retrieval

ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval

Composed image retrieval (CIR) is the task of retrieving a target image specified by a query image and a relative text that describes a semantic modification to the query image. Existing methods in CIR struggle to accurately represent the…

Computer Vision and Pattern Recognition · Computer Science 2025-05-28 Eric Xing , Pranavi Kolouju , Robert Pless , Abby Stylianou , Nathan Jacobs

A Sanity Check on Composed Image Retrieval

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Yikun Liu , Jiangchao Yao , Weidi Xie , Yanfeng Wang

A Comprehensive Survey on Composed Image Retrieval

Composed Image Retrieval (CIR) is an emerging yet challenging task that allows users to search for target images using a multimodal query, comprising a reference image and a modification text specifying the user's desired changes to the…

Multimedia · Computer Science 2025-03-05 Xuemeng Song , Haoqiang Lin , Haokun Wen , Bohan Hou , Mingzhu Xu , Liqiang Nie

Zero-Shot Composed Image Retrieval with Textual Inversion

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images. The high effort and cost required for labeling…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Alberto Baldrati , Lorenzo Agnolucci , Marco Bertini , Alberto Del Bimbo

Do Composed Image Retrieval Benchmarks Require Multimodal Composition?

Composed Image Retrieval (CIR) is a multimodal retrieval task where a query consists of a reference image and a textual modification, and the goal is to retrieve a target image satisfying both. In principle, strong performance on CIR…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Matteo Attimonelli , Alessandro De Bellis , Aryo Pradipta Gema , Rohit Saxena , Monica Sekoyan , Wai-Chung Kwan , Claudio Pomo , Alessandro Suglia , Dietmar Jannach , Tommaso Di Noia , Pasquale Minervini

iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval

Given a query consisting of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption. The…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Lorenzo Agnolucci , Alberto Baldrati , Alberto Del Bimbo , Marco Bertini

Scaling Prompt Instructed Zero Shot Composed Image Retrieval with Image-Only Data

Composed Image Retrieval (CIR) is the task of retrieving images matching a reference image augmented with a text, where the text describes changes to the reference image in natural language. Traditionally, models designed for CIR have…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Yiqun Duan , Sameera Ramasinghe , Stephen Gould , Ajanthan Thalaiyasingam

Target-Guided Composed Image Retrieval

Composed image retrieval (CIR) is a new and flexible image retrieval paradigm, which can retrieve the target image for a multimodal query, including a reference image and its corresponding modification text. Although existing efforts have…

Multimedia · Computer Science 2023-09-06 Haokun Wen , Xian Zhang , Xuemeng Song , Yinwei Wei , Liqiang Nie

FineCIR: Explicit Parsing of Fine-Grained Modification Semantics for Composed Image Retrieval

Composed Image Retrieval (CIR) facilitates image retrieval through a multimodal query consisting of a reference image and modification text. The reference image defines the retrieval context, while the modification text specifies desired…

Computer Vision and Pattern Recognition · Computer Science 2025-03-28 Zixu Li , Zhiheng Fu , Yupeng Hu , Zhiwei Chen , Haokun Wen , Liqiang Nie

FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval

Composed image retrieval (CIR) aims to retrieve a target image that depicts a reference image modified by a textual description. While recent vision-language models (VLMs) achieve promising CIR performance by embedding images and text into…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 François Gardères , Camille-Sovanneary Gauthier , Jean Ponce , Shizhe Chen

TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

Composed Image Retrieval (CIR) is an important image retrieval paradigm that enables users to retrieve a target image using a multimodal query that consists of a reference image and modification text. Although research on CIR has made…

Computer Vision and Pattern Recognition · Computer Science 2026-04-27 Zixu Li , Yupeng Hu , Zhiheng Fu , Zhiwei Chen , Yongqi Li , Liqiang Nie

Data Roaming and Quality Assessment for Composed Image Retrieval

The task of Composed Image Retrieval (CoIR) involves queries that combine image and text modalities, allowing users to express their intent more effectively. However, current CoIR datasets are orders of magnitude smaller compared to other…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Matan Levy , Rami Ben-Ari , Nir Darshan , Dani Lischinski

Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing

Composed Image Retrieval (CIR) is a pivotal and complex task in multimodal understanding. Current CIR benchmarks typically feature limited query categories and fail to capture the diverse requirements of real-world scenarios. To bridge this…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Tingyu Song , Yanzhao Zhang , Mingxin Li , Zhuoning Guo , Dingkun Long , Pengjun Xie , Siyue Zhang , Yilun Zhao , Shu Wu

Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval

The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent. Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-05 Yongchao Du , Min Wang , Wengang Zhou , Shuping Hui , Houqiang Li

Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval

Composed image retrieval (CIR) aims to retrieve the target image based on a multimodal query, i.e., a reference image paired with corresponding modification text. Recent CIR studies leverage vision-language pre-trained (VLP) methods as the…

Multimedia · Computer Science 2024-04-25 Haokun Wen , Xuemeng Song , Xiaolin Chen , Yinwei Wei , Liqiang Nie , Tat-Seng Chua

TMCIR: Token Merge Benefits Composed Image Retrieval

Composed Image Retrieval (CIR) retrieves target images using a multi-modal query that combines a reference image with text describing desired modifications. The primary challenge is effectively fusing this visual and textual information.…

Computer Vision and Pattern Recognition · Computer Science 2025-04-16 Chaoyang Wang , Zeyu Zhang , Long Teng , Zijun Li , Shichao Kan

Sentence-level Prompts Benefit Composed Image Retrieval

Composed image retrieval (CIR) is the task of retrieving specific images by using a query that involves both a reference image and a relative caption. Most existing CIR models adopt the late-fusion strategy to combine visual and language…

Computer Vision and Pattern Recognition · Computer Science 2023-10-10 Yang Bai , Xinxing Xu , Yong Liu , Salman Khan , Fahad Khan , Wangmeng Zuo , Rick Siow Mong Goh , Chun-Mei Feng

Compositional Image Retrieval via Instruction-Aware Contrastive Learning

Composed Image Retrieval (CIR) involves retrieving a target image based on a composed query of an image paired with text that specifies modifications or changes to the visual reference. CIR is inherently an instruction-following task, as…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Wenliang Zhong , Weizhi An , Feng Jiang , Hehuan Ma , Yuzhi Guo , Junzhou Huang

good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval

Composed image retrieval (CIR) enables users to search images using a reference image combined with textual modifications. Recent advances in vision-language models have improved CIR, but dataset limitations remain a barrier. Existing…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Pranavi Kolouju , Eric Xing , Robert Pless , Nathan Jacobs , Abby Stylianou

Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text. Advanced methods often utilize contrastive learning as the optimization objective, which…

Computer Vision and Pattern Recognition · Computer Science 2024-08-08 Zhangchi Feng , Richong Zhang , Zhijie Nie