Related papers: Diffuse, Attend, and Segment: Unsupervised Zero-Sh…

Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

Zero-shot referring image segmentation is a challenging task because it aims to find an instance segmentation mask based on the given referring descriptions, without training on this type of paired data. Current zero-shot methods mainly…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Minheng Ni , Yabo Zhang , Kailai Feng , Xiaoming Li , Yiwen Guo , Wangmeng Zuo

Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation

Recent progress in interactive point prompt based Image Segmentation allows to significantly reduce the manual effort to obtain high quality semantic labels. State-of-the-art unsupervised methods use self-supervised pre-trained models to…

Computer Vision and Pattern Recognition · Computer Science 2025-03-21 Markus Karmann , Onay Urfalioglu

SeeDiff: Off-the-Shelf Seeded Mask Generation from Diffusion Models

Entrusted with the goal of pixel-level object classification, the semantic segmentation networks entail the laborious preparation of pixel-level annotation masks. To obtain pixel-level annotation masks for a given class without human…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Joon Hyun Park , Kumju Jo , Sungyong Baik

Unsupervised Segmentation by Diffusing, Walking and Cutting

We propose an unsupervised image segmentation method using features from pre-trained text-to-image diffusion models. Inspired by classic spectral clustering approaches, we construct adjacency matrices from self-attention layers between…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Daniela Ivanova , Marco Aversa , Paul Henderson , John Williamson

Self-Attention Diffusion Models for Zero-Shot Biomedical Image Segmentation: Unlocking New Frontiers in Medical Imaging

Producing high-quality segmentation masks for medical images is a fundamental challenge in biomedical image analysis. Recent research has explored large-scale supervised training to enable segmentation across various medical imaging…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Abderrachid Hamrani , Anuradha Godavarty

SLiMe: Segment Like Me

Significant strides have been made using large vision-language models, like Stable Diffusion (SD), for a variety of downstream tasks, including image editing, image correspondence, and 3D shape generation. Inspired by these advancements, we…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Aliasghar Khani , Saeid Asgari Taghanaki , Aditya Sanghi , Ali Mahdavi Amiri , Ghassan Hamarneh

A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting

Panoptic and instance segmentation networks are often trained with specialized object detection modules, complex loss functions, and ad-hoc post-processing steps to manage the permutation-invariance of the instance masks. This work builds…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Wouter Van Gansbeke , Bert De Brabandere

MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation

Semantic segmentation is essential in computer vision for various applications, yet traditional approaches face significant challenges, including the high cost of annotation and extensive training for supervised learning. Additionally, due…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Yasufumi Kawano , Yoshimitsu Aoki

Making Training-Free Diffusion Segmentors Scale with the Generative Power

As powerful generative models, text-to-image diffusion models have recently been explored for discriminative tasks. A line of research focuses on adapting a pre-trained diffusion model to semantic segmentation without any further training,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Benyuan Meng , Qianqian Xu , Zitai Wang , Xiaochun Cao , Longtao Huang , Qingming Huang

Segmentation-Free Guidance for Text-to-Image Diffusion Models

We introduce segmentation-free guidance, a novel method designed for text-to-image diffusion models like Stable Diffusion. Our method does not require retraining of the diffusion model. At no additional compute cost, it uses the diffusion…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Kambiz Azarian , Debasmit Das , Qiqi Hou , Fatih Porikli

Unsupervised Class Generation to Expand Semantic Segmentation Datasets

Semantic segmentation is a computer vision task where classification is performed at a pixel level. Due to this, the process of labeling images for semantic segmentation is time-consuming and expensive. To mitigate this cost there has been…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Javier Montalvo , Álvaro García-Martín , Pablo Carballeira , Juan C. SanMiguel

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

Collecting and annotating images with pixel-wise labels is time-consuming and laborious. In contrast, synthetic data can be freely available using a generative model (e.g., DALL-E, Stable Diffusion). In this paper, we show that it is…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Weijia Wu , Yuzhong Zhao , Mike Zheng Shou , Hong Zhou , Chunhua Shen

FA-Seg: A Fast and Accurate Diffusion-Based Method for Open-Vocabulary Segmentation

Open-vocabulary semantic segmentation (OVSS) aims to segment objects from arbitrary text categories without requiring densely annotated datasets. Although contrastive learning based models enable zero-shot segmentation, they often lose fine…

Computer Vision and Pattern Recognition · Computer Science 2026-04-30 Huy Che , Vinh-Tiep Nguyen

DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut

Foundation models have emerged as powerful tools across various domains including language, vision, and multimodal tasks. While prior works have addressed unsupervised image segmentation, they significantly lag behind supervised models. In…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Paul Couairon , Mustafa Shukor , Jean-Emmanuel Haugeard , Matthieu Cord , Nicolas Thome

iSeg: An Iterative Refinement-based Framework for Training-free Segmentation

Stable diffusion has demonstrated strong image synthesis ability to given text descriptions, suggesting it to contain strong semantic clue for grouping objects. The researchers have explored employing stable diffusion for training-free…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Lin Sun , Jiale Cao , Jin Xie , Fahad Shahbaz Khan , Yanwei Pang

Studying Image Diffusion Features for Zero-Shot Video Object Segmentation

This paper investigates the use of large-scale diffusion models for Zero-Shot Video Object Segmentation (ZS-VOS) without fine-tuning on video data or training on any image segmentation data. While diffusion models have demonstrated strong…

Computer Vision and Pattern Recognition · Computer Science 2025-04-09 Thanos Delatolas , Vicky Kalogeiton , Dim P. Papadopoulos

Your Diffusion Model is Secretly a Zero-Shot Classifier

The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive…

Machine Learning · Computer Science 2023-09-14 Alexander C. Li , Mihir Prabhudesai , Shivam Duggal , Ellis Brown , Deepak Pathak

Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing

Current semantic segmentation models typically require a substantial amount of manually annotated data, a process that is both time-consuming and resource-intensive. Alternatively, leveraging advanced text-to-image models such as Midjourney…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Bo Gao , Jianhui Wang , Xinyuan Song , Yangfan He , Fangxu Xing , Tianyu Shi

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Xuehai He , Weixi Feng , Tsu-Jui Fu , Varun Jampani , Arjun Akula , Pradyumna Narayana , Sugato Basu , William Yang Wang , Xin Eric Wang

A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation

Few-shot segmentation focuses on the generalization of models to segment unseen object with limited annotated samples. However, existing approaches still face two main challenges. First, huge feature distinction between support and query…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Qi Zhao , Binghao Liu , Shuchang Lyu , Huojin Chen