English
Related papers

Related papers: ICAR: Image-based Complementary Auto Reasoning

200 papers

Composed image retrieval (CIR) is a vision language task that retrieves a target image using a reference image and modification text, enabling intuitive specification of desired changes. While effectively fusing visual and textual…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Jeong-Woo Park , Young-Eun Kim , Seong-Whan Lee

Vision transformers in vision-language models typically use the same amount of compute for every image, regardless of whether it is simple or complex. We propose ICAR (Image Complexity-Aware Retrieval), an adaptive computation approach that…

Information Retrieval · Computer Science 2026-01-16 Mikel Williams-Lekuona , Georgina Cosma

Composed Image Retrieval (CIR) retrieves target images using a reference image paired with modification text. Despite rapid advances, all existing methods and datasets operate at the image level -- a single reference image plus modification…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Peng Yuan , Bingyin Mei , Hui Zhang

Composed image retrieval (CIR) requires multi-modal models to jointly reason over visual content and semantic modifications presented in text-image input pairs. While current CIR models achieve strong performance on common benchmark cases,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Chenchen Zhao , Jianhuan Zhuo , Muxi Chen , Zhaohua Zhang , Wenyu Jiang , Tianwen Jiang , Qiuyong Xiao , Jihong Zhang , Qiang Xu

In visual retrieval systems, updating the embedding model requires recomputing features for every piece of data. This expensive process is referred to as backfilling. Recently, the idea of backward compatible training (BCT) was proposed. To…

Computer Vision and Pattern Recognition · Computer Science 2022-03-31 Vivek Ramanujan , Pavan Kumar Anasosalu Vasu , Ali Farhadi , Oncel Tuzel , Hadi Pouransari

Composed image retrieval (CIR) aims to retrieve a target image that depicts a reference image modified by a textual description. While recent vision-language models (VLMs) achieve promising CIR performance by embedding images and text into…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 François Gardères , Camille-Sovanneary Gauthier , Jean Ponce , Shizhe Chen

Image classification, which classifies images by pre-defined categories, has been the dominant approach to visual representation learning over the last decade. Visual learning through image-text alignment, however, has emerged to show…

Computer Vision and Pattern Recognition · Computer Science 2022-04-25 Yixuan Wei , Yue Cao , Zheng Zhang , Zhuliang Yao , Zhenda Xie , Han Hu , Baining Guo

The field of autonomous vehicles (AVs) predominantly leverages multi-modal integration of LiDAR and camera data to achieve better performance compared to using a single modality. However, the fusion process encounters challenges in…

Computer Vision and Pattern Recognition · Computer Science 2024-08-22 Sanjay Bhargav Dharavath , Tanmoy Dam , Supriyo Chakraborty , Prithwiraj Roy , Aniruddha Maiti

Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving in recent years. However, current collaborative perception predominantly relies on LiDAR point clouds, with significantly…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Shaohong Wang , Lu Bin , Xinyu Xiao , Zhiyu Xiang , Hangguan Shan , Eryun Liu

Humans can progressively learn visual concepts from easy to hard questions. To mimic this efficient learning ability, we propose a competence-aware curriculum for visual concept learning in a question-answering manner. Specifically, we…

Computer Vision and Pattern Recognition · Computer Science 2020-07-29 Qing Li , Siyuan Huang , Yining Hong , Song-Chun Zhu

Composed Image Retrieval (CIR) enables fine-grained visual search by combining a reference image with a textual modification. While supervised CIR methods achieve high accuracy, their reliance on costly triplet annotations motivates…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Xin Wang , Haipeng Zhang , Mang Li , Zhaohui Xia , Yueguo Chen , Yu Zhang , Chunyu Wei

We propose FusionBERT, a novel multi-view visual fusion framework for image-3D multimodal retrieval. Existing image-3D representation learning methods predominantly focus on feature alignment of a single object image and its 3D model,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Wei Li , Yufan Ren , Hanqing Jiang , Jianhui Ding , Zhen Peng , Leman Feng , Yichun Shentu , Guoqiang Xu , Baigui Sun

Composed Image Retrieval (CIR) is a challenging task that aims to retrieve the target image with a multimodal query, i.e., a reference image, and its complementary modification text. As previous supervised or zero-shot learning paradigms…

Computer Vision and Pattern Recognition · Computer Science 2024-11-13 Bohan Hou , Haoqiang Lin , Haokun Wen , Meng Liu , Mingzhu Xu , Xuemeng Song

The rapid expansion of online fashion platforms has created an increasing demand for intelligent recommender systems capable of understanding both visual and textual cues. This paper proposes a hybrid multimodal deep learning framework for…

Information Retrieval · Computer Science 2025-11-20 Kamand Kalashi , Babak Teimourpour

Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image, and a relative caption that specifies the desired modification. Despite the rapid development of CIR models, their performance is…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Yikun Liu , Jiangchao Yao , Weidi Xie , Yanfeng Wang

For fashion outfits to be considered aesthetically pleasing, the garments that constitute them need to be compatible in terms of visual aspects, such as style, category and color. Previous works have defined visual compatibility as a binary…

Computer Vision and Pattern Recognition · Computer Science 2022-09-09 Stefanos-Iordanis Papadopoulos , Christos Koutlis , Symeon Papadopoulos , Ioannis Kompatsiaris

In natural vision, feedback connections support versatile visual inference capabilities such as making sense of the occluded or noisy bottom-up sensory information or mediating pure top-down processes such as imagination. However, the…

Neurons and Cognition · Quantitative Biology 2023-11-01 Tahereh Toosi , Elias B. Issa

We propose a way to learn visual features that are compatible with previously computed ones even when they have different dimensions and are learned via different neural network architectures and loss functions. Compatible means that, if…

Computer Vision and Pattern Recognition · Computer Science 2021-01-07 Yantao Shen , Yuanjun Xiong , Wei Xia , Stefano Soatto

The progress of composed image retrieval (CIR), a popular research direction in image retrieval, where a combined visual and textual query is used, is held back by the absence of high-quality training and evaluation data. We introduce a new…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Bill Psomas , George Retsinas , Nikos Efthymiadis , Panagiotis Filntisis , Yannis Avrithis , Petros Maragos , Ondrej Chum , Giorgos Tolias

Composed Image Retrieval (CIR), which aims to find a target image from a reference image and a modification text, presents the core challenge of performing unified reasoning across visual and semantic modalities. While current approaches…

Computer Vision and Pattern Recognition · Computer Science 2025-10-10 Weihuang Lin , Yiwei Ma , Jiayi Ji , Xiaoshuai Sun , Rongrong Ji
‹ Prev 1 2 3 10 Next ›