Related papers: Visual Zero-Shot E-Commerce Product Attribute Valu…

Zero-Shot Visual Classification with Guided Cropping

Pretrained vision-language models, such as CLIP, show promising zero-shot performance across a wide variety of datasets. For closed-set classification tasks, however, there is an inherent limitation: CLIP image encoders are typically…

Computer Vision and Pattern Recognition · Computer Science 2023-09-14 Piyapat Saranrittichai , Mauricio Munoz , Volker Fischer , Chaithanya Kumar Mummadi

Large Scale Generative Multimodal Attribute Extraction for E-commerce Attributes

E-commerce websites (e.g. Amazon) have a plethora of structured and unstructured information (text and images) present on the product pages. Sellers often either don't label or mislabel values of the attributes (e.g. color, size etc.) for…

Computer Vision and Pattern Recognition · Computer Science 2023-06-02 Anant Khandelwal , Happy Mittal , Shreyas Sunil Kulkarni , Deepak Gupta

Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models

Audio-visual zero-shot learning methods commonly build on features extracted from pre-trained models, e.g. video or audio classification models. However, existing benchmarks predate the popularization of large multi-modal models, such as…

Computer Vision and Pattern Recognition · Computer Science 2024-04-10 David Kurzendörfer , Otniel-Bogdan Mercea , A. Sophia Koepke , Zeynep Akata

WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation

Visual anomaly classification and segmentation are vital for automating industrial quality inspection. The focus of prior research in the field has been on training custom models for each quality inspection task, which requires…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jongheon Jeong , Yang Zou , Taewan Kim , Dongqing Zhang , Avinash Ravichandran , Onkar Dabeer

VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation

Recently, large-scale vision-language models such as CLIP have demonstrated immense potential in zero-shot anomaly segmentation (ZSAS) task, utilizing a unified model to directly detect anomalies on any unseen product with painstakingly…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Zhen Qu , Xian Tao , Mukesh Prasad , Fei Shen , Zhengtao Zhang , Xinyi Gong , Guiguang Ding

PAM: Understanding Product Images in Cross Product Category Attribute Extraction

Understanding product attributes plays an important role in improving online shopping experience for customers and serves as an integral part for constructing a product knowledge graph. Most existing methods focus on attribute extraction…

Computer Vision and Pattern Recognition · Computer Science 2021-06-10 Rongmei Lin , Xiang He , Jie Feng , Nasser Zalmout , Yan Liang , Li Xiong , Xin Luna Dong

Product Information Extraction using ChatGPT

Structured product data in the form of attribute/value pairs is the foundation of many e-commerce applications such as faceted product search, product comparison, and product recommendation. Product offers often only contain textual…

Computation and Language · Computer Science 2023-06-28 Alexander Brinkmann , Roee Shraga , Reng Chiz Der , Christian Bizer

Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization

With the prosperity of e-commerce industry, various modalities, e.g., vision and language, are utilized to describe product items. It is an enormous challenge to understand such diversified data, especially via extracting the…

Computer Vision and Pattern Recognition · Computer Science 2023-04-07 Mengyin Liu , Chao Zhu , Hongyu Gao , Weibo Gu , Hongfa Wang , Wei Liu , Xu-cheng Yin

Zoom-shot: Fast and Efficient Unsupervised Zero-Shot Transfer of CLIP to Vision Encoders with Multimodal Loss

The fusion of vision and language has brought about a transformative shift in computer vision through the emergence of Vision-Language Models (VLMs). However, the resource-intensive nature of existing VLMs poses a significant challenge. We…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Jordan Shipard , Arnold Wiliem , Kien Nguyen Thanh , Wei Xiang , Clinton Fookes

Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition

In this study, we define and tackle zero shot "real" classification by description, a novel task that evaluates the ability of Vision-Language Models (VLMs) like CLIP to classify objects based solely on descriptive attributes, excluding…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Ethan Baron , Idan Tankel , Peter Tu , Guy Ben-Yosef

Will It Zero-Shot?: Predicting Zero-Shot Classification Performance For Arbitrary Queries

Vision-Language Models like CLIP create aligned embedding spaces for text and images, making it possible for anyone to build a visual classifier by simply naming the classes they want to distinguish. However, a model that works well in one…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Kevin Robbins , Xiaotong Liu , Yu Wu , Le Sun , Grady McPeak , Abby Stylianou , Robert Pless

Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product

Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product recommendations, and product retrieval. While in the real world, the attribute values of a product are usually incomplete and vary…

Computation and Language · Computer Science 2020-09-16 Tiangang Zhu , Yue Wang , Haoran Li , Youzheng Wu , Xiaodong He , Bowen Zhou

Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Vehicle make and model recognition (VMMR) is an important task in intelligent transportation systems, but existing approaches struggle to adapt to newly released models. Contrastive Language-Image Pretraining (CLIP) provides strong…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Wei-Chia Chang , Yan-Ann Chen

Zero-shot Object Counting with Good Exemplars

Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names of object classes during testing, without the need for manual annotations. However, a critical challenge in current ZOC methods lies in their inability…

Computer Vision and Pattern Recognition · Computer Science 2024-07-10 Huilin Zhu , Jingling Yuan , Zhengwei Yang , Yu Guo , Zheng Wang , Xian Zhong , Shengfeng He

Multi-Label Zero-Shot Product Attribute-Value Extraction

E-commerce platforms should provide detailed product descriptions (attribute values) for effective product search and recommendation. However, attribute value information is typically not available for new products. To predict unseen…

Information Retrieval · Computer Science 2024-02-15 Jiaying Gong , Hoda Eldardiry

Enhancing Spatio-Temporal Zero-shot Action Recognition with Language-driven Description Attributes

Vision-Language Models (VLMs) have demonstrated impressive capabilities in zero-shot action recognition by learning to associate video embeddings with class embeddings. However, a significant challenge arises when relying solely on action…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Yehna Kim , Young-Eun Kim , Seong-Whan Lee

ViP$^2$-CLIP: Visual-Perception Prompting with Unified Alignment for Zero-Shot Anomaly Detection

Zero-shot anomaly detection (ZSAD) aims to detect anomalies without any target domain training samples, relying solely on external auxiliary data. Existing CLIP-based methods attempt to activate the model's ZSAD potential via handcrafted or…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Ziteng Yang , Jingzehua Xu , Yanshu Li , Zepeng Li , Yeqiang Wang , Xinghui Li

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

Large-scale pre-trained multi-modal models (e.g., CLIP) demonstrate strong zero-shot transfer capability in many discriminative tasks. Their adaptation to zero-shot image-conditioned text generation tasks has drawn increasing interest.…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Wei Li , Linchao Zhu , Longyin Wen , Yi Yang

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

Vision-language models trained on large, randomly collected data had significant impact in many areas since they appeared. But as they show great performance in various fields, such as image-text-retrieval, their inner workings are still…

Computer Vision and Pattern Recognition · Computer Science 2022-09-15 Felix Vogel , Nina Shvetsova , Leonid Karlinsky , Hilde Kuehne

Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models

Accurate video moment retrieval (VMR) requires universal visual-textual correlations that can handle unknown vocabulary and unseen scenes. However, the learned correlations are likely either biased when derived from a limited amount of…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Dezhao Luo , Jiabo Huang , Shaogang Gong , Hailin Jin , Yang Liu