Related papers: Zero-guidance Segmentation Using Zero Segment Labe…

Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels

Large-scale vision-language models like CLIP have demonstrated impressive open-vocabulary capabilities for image-level tasks, excelling in recognizing what objects are present. However, they struggle with pixel-level recognition tasks like…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Heeseong Shin , Chaehyun Kim , Sunghwan Hong , Seokju Cho , Anurag Arnab , Paul Hongsuck Seo , Seungryong Kim

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

We introduce a method that allows to automatically segment images into semantically meaningful regions without human supervision. Derived regions are consistent across different images and coincide with human-defined semantic classes on…

Computer Vision and Pattern Recognition · Computer Science 2021-11-22 Daniil Pakhomov , Sanchit Hira , Narayani Wagle , Kemar E. Green , Nassir Navab

Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples

Semantic segmentation is a key computer vision task that has been actively researched for decades. In recent years, supervised methods have reached unprecedented accuracy, however they require many pixel-level annotations for every new…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Nir Zabari , Yedid Hoshen

Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

To bridge the gap between supervised semantic segmentation and real-world applications that acquires one model to recognize arbitrary new concepts, recent zero-shot segmentation attracts a lot of attention by exploring the relationships…

Computer Vision and Pattern Recognition · Computer Science 2022-11-01 Quande Liu , Youpeng Wen , Jianhua Han , Chunjing Xu , Hang Xu , Xiaodan Liang

CLIP-S$^4$: Language-Guided Self-Supervised Semantic Segmentation

Existing semantic segmentation approaches are often limited by costly pixel-wise annotations and predefined classes. In this work, we present CLIP-S$^4$ that leverages self-supervised pixel representation learning and vision-language models…

Computer Vision and Pattern Recognition · Computer Science 2023-05-03 Wenbin He , Suphanut Jamonnak , Liang Gou , Liu Ren

CLIP-GCD: Simple Language Guided Generalized Category Discovery

Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data. Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-18 Rabah Ouldnoughi , Chia-Wen Kuo , Zsolt Kira

CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free

The emergence of CLIP has opened the way for open-world image perception. The zero-shot classification capabilities of the model are impressive but are harder to use for dense tasks such as image segmentation. Several methods have proposed…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Monika Wysoczańska , Michaël Ramamonjisoa , Tomasz Trzciński , Oriane Siméoni

Exploring Open-Vocabulary Semantic Segmentation without Human Labels

Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level. However, existing approaches often rely on expensive human annotations as supervision for…

Computer Vision and Pattern Recognition · Computer Science 2023-06-02 Jun Chen , Deyao Zhu , Guocheng Qian , Bernard Ghanem , Zhicheng Yan , Chenchen Zhu , Fanyi Xiao , Mohamed Elhoseiny , Sean Chang Culatana

Training-Free Semantic Segmentation via LLM-Supervision

Recent advancements in open vocabulary models, like CLIP, have notably advanced zero-shot classification and segmentation by utilizing natural language for class-specific embeddings. However, most research has focused on improving model…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Wenfang Sun , Yingjun Du , Gaowen Liu , Ramana Kompella , Cees G. M. Snoek

Tuning-free Universally-Supervised Semantic Segmentation

This work presents a tuning-free semantic segmentation framework based on classifying SAM masks by CLIP, which is universally applicable to various types of supervision. Initially, we utilize CLIP's zero-shot classification ability to…

Computer Vision and Pattern Recognition · Computer Science 2024-05-24 Xiaobo Yang , Xiaojin Gong

Zero-Shot Pseudo Labels Generation Using SAM and CLIP for Semi-Supervised Semantic Segmentation

Semantic segmentation is a fundamental task in medical image analysis and autonomous driving and has a problem with the high cost of annotating the labels required in training. To address this problem, semantic segmentation methods based on…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Nagito Saito , Shintaro Ito , Koichi Ito , Takafumi Aoki

Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level…

Computer Vision and Pattern Recognition · Computer Science 2024-07-12 Tong Shao , Zhuotao Tian , Hang Zhao , Jingyong Su

Extract Free Dense Labels from CLIP

Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition. Many recent studies leverage the pre-trained CLIP models for image-level classification and manipulation. In…

Computer Vision and Pattern Recognition · Computer Science 2022-07-28 Chong Zhou , Chen Change Loy , Bo Dai

CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections

In the era of foundation models, CLIP has emerged as a powerful tool for aligning text & visual modalities into a common embedding space. However, the alignment objective used to train CLIP often results in subpar visual features for…

Computer Vision and Pattern Recognition · Computer Science 2025-04-11 Mohamed Fazli Imam , Rufael Fedaku Marew , Jameel Hassan , Mustansar Fiaz , Alham Fikri Aji , Hisham Cholakkal

The Solution for Language-Enhanced Image New Category Discovery

Treating texts as images, combining prompts with textual labels for prompt tuning, and leveraging the alignment properties of CLIP have been successfully applied in zero-shot multi-label image recognition. Nonetheless, relying solely on…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Haonan Xu , Dian Chao , Xiangyu Wu , Zhonghua Wan , Yang Yang

A Closer Look at Self-training for Zero-Label Semantic Segmentation

Being able to segment unseen classes not observed during training is an important technical challenge in deep learning, because of its potential to reduce the expensive annotation required for semantic segmentation. Prior zero-label…

Computer Vision and Pattern Recognition · Computer Science 2021-04-26 Giuseppe Pastore , Fabio Cermelli , Yongqin Xian , Massimiliano Mancini , Zeynep Akata , Barbara Caputo

TAG: Guidance-free Open-Vocabulary Semantic Segmentation

Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Yasufumi Kawano , Yoshimitsu Aoki

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Open-vocabulary semantic segmentation presents the challenge of labeling each pixel within an image based on a wide range of text descriptions. In this work, we introduce a novel cost-based approach to adapt vision-language foundation…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Seokju Cho , Heeseong Shin , Sunghwan Hong , Anurag Arnab , Paul Hongsuck Seo , Seungryong Kim

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

Open-vocabulary semantic segmentation aims to segment an image into semantic regions according to text descriptions, which may not have been seen during training. Recent two-stage methods first generate class-agnostic mask proposals and…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Feng Liang , Bichen Wu , Xiaoliang Dai , Kunpeng Li , Yinan Zhao , Hang Zhang , Peizhao Zhang , Peter Vajda , Diana Marculescu

CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation

Household environments are visually diverse. Embodied agents performing Vision-and-Language Navigation (VLN) in the wild must be able to handle this diversity, while also following arbitrary language instructions. Recently, Vision-Language…

Computer Vision and Pattern Recognition · Computer Science 2022-12-01 Vishnu Sashank Dorbala , Gunnar Sigurdsson , Robinson Piramuthu , Jesse Thomason , Gaurav S. Sukhatme