Related papers: Diffusion Model as a Generalist Segmentation Learn…

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Jinglong Wang , Xiawei Li , Jing Zhang , Qingyuan Xu , Qin Zhou , Qian Yu , Lu Sheng , Dong Xu

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models

Foundation models have exhibited unprecedented capabilities in tackling many domains and tasks. Models such as CLIP are currently widely used to bridge cross-modal representations, and text-to-image diffusion models are arguably the leading…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Barbara Toniella Corradini , Mustafa Shukor , Paul Couairon , Guillaume Couairon , Franco Scarselli , Matthieu Cord

Open-vocabulary Object Segmentation with Diffusion Models

The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i.e., simultaneously generating images and segmentation masks for the corresponding…

Computer Vision and Pattern Recognition · Computer Science 2023-08-11 Ziyi Li , Qinye Zhou , Xiaoyun Zhang , Ya Zhang , Yanfeng Wang , Weidi Xie

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery

Learning from a large corpus of data, pre-trained models have achieved impressive progress nowadays. As popular generative pre-training, diffusion models capture both low-level visual knowledge and high-level semantic relations. In this…

Computer Vision and Pattern Recognition · Computer Science 2023-03-20 Chaofan Ma , Yuhuan Yang , Chen Ju , Fei Zhang , Jinxiang Liu , Yu Wang , Ya Zhang , Yanfeng Wang

From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation

Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Zishen Qu , Xuesong Li , Haijian Gu , Hongwei Kang , Quan Meng , Tianrui Niu , Xin Yang , Ruidong Pan

Diffusion Features to Bridge Domain Gap for Semantic Segmentation

Pre-trained diffusion models have demonstrated remarkable proficiency in synthesizing images across a wide range of scenarios with customizable prompts, indicating their effective capacity to capture universal features. Motivated by this,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-22 Yuxiang Ji , Boyong He , Chenyuan Qu , Zhuoyue Tan , Chuan Qin , Liaoni Wu

Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation

The Diffusion Model has not only garnered noteworthy achievements in the realm of image generation but has also demonstrated its potential as an effective pretraining method utilizing unlabeled data. Drawing from the extensive potential…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Muzhi Zhu , Yang Liu , Zekai Luo , Chenchen Jing , Hao Chen , Guangkai Xu , Xinlong Wang , Chunhua Shen

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Xiaoyu Zhu , Hao Zhou , Pengfei Xing , Long Zhao , Hao Xu , Junwei Liang , Alexander Hauptmann , Ting Liu , Andrew Gallagher

TextDiffSeg: Text-guided Latent Diffusion Model for 3d Medical Images Segmentation

Diffusion Probabilistic Models (DPMs) have demonstrated significant potential in 3D medical image segmentation tasks. However, their high computational cost and inability to fully capture global 3D contextual information limit their…

Image and Video Processing · Electrical Eng. & Systems 2025-04-17 Kangbo Ma

GS: Generative Segmentation via Label Diffusion

Language-driven image segmentation is a fundamental task in vision-language understanding, requiring models to segment regions of an image corresponding to natural language expressions. Traditional methods approach this as a discriminative…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Yuhao Chen , Shubin Chen , Liang Lin , Guangrun Wang

Prompting Diffusion Representations for Cross-Domain Semantic Segmentation

While originally designed for image generation, diffusion models have recently shown to provide excellent pretrained feature representations for semantic segmentation. Intrigued by this result, we set out to explore how well…

Computer Vision and Pattern Recognition · Computer Science 2023-07-06 Rui Gong , Martin Danelljan , Han Sun , Julio Delgado Mangas , Luc Van Gool

Seg4Diff: Unveiling Open-Vocabulary Segmentation in Text-to-Image Diffusion Transformers

Text-to-image diffusion models excel at translating language prompts into photorealistic images by implicitly grounding textual concepts through their cross-modal attention mechanisms. Recent multi-modal diffusion transformers extend this…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Chaehyun Kim , Heeseong Shin , Eunbeen Hong , Heeji Yoon , Anurag Arnab , Paul Hongsuck Seo , Sunghwan Hong , Seungryong Kim

Label-Efficient Semantic Segmentation with Diffusion Models

Denoising diffusion probabilistic models have recently received much research attention since they outperform alternative approaches, such as GANs, and currently provide state-of-the-art generative performance. The superior performance of…

Computer Vision and Pattern Recognition · Computer Science 2022-03-17 Dmitry Baranchuk , Ivan Rubachev , Andrey Voynov , Valentin Khrulkov , Artem Babenko

Diffusion Models For Multi-Modal Generative Modeling

Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of…

Computer Vision and Pattern Recognition · Computer Science 2024-09-26 Changyou Chen , Han Ding , Bunyamin Sisman , Yi Xu , Ouye Xie , Benjamin Z. Yao , Son Dinh Tran , Belinda Zeng

Medical Semantic Segmentation with Diffusion Pretrain

Recent advances in deep learning have shown that learning robust feature representations is critical for the success of many computer vision tasks, including medical image segmentation. In particular, both transformer and…

Computer Vision and Pattern Recognition · Computer Science 2025-02-03 David Li , Anvar Kurmukov , Mikhail Goncharov , Roman Sokolov , Mikhail Belyaev

Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images. In semantic segmentation, one promising approach is extracting pseudo-masks from attention maps in…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Ryota Yoshihashi , Yuya Otsuka , Kenji Doi , Tomohiro Tanaka , Hirokatsu Kataoka

DFormer: Diffusion-guided Transformer for Universal Image Segmentation

This paper introduces an approach, named DFormer, for universal image segmentation. The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model. DFormer first adds various levels of Gaussian…

Computer Vision and Pattern Recognition · Computer Science 2023-06-09 Hefeng Wang , Jiale Cao , Rao Muhammad Anwer , Jin Xie , Fahad Shahbaz Khan , Yanwei Pang

A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation

Remote sensing semantic segmentation must address both what the ground objects are within an image and where they are located. Consequently, segmentation models must ensure not only the semantic correctness of large-scale patches…

Computer Vision and Pattern Recognition · Computer Science 2026-01-28 Hao Wang , Keyan Hu , Xin Guo , Haifeng Li , Chao Tao

Toward a Diffusion-Based Generalist for Dense Vision Tasks

Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2024-07-02 Yue Fan , Yongqin Xian , Xiaohua Zhai , Alexander Kolesnikov , Muhammad Ferjad Naeem , Bernt Schiele , Federico Tombari

Dual Diffusion for Unified Image Generation and Understanding

Diffusion models have gained tremendous success in text-to-image generation, yet still lag behind with visual understanding tasks, an area dominated by autoregressive vision-language models. We propose a large-scale and fully end-to-end…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Zijie Li , Henry Li , Yichun Shi , Amir Barati Farimani , Yuval Kluger , Linjie Yang , Peng Wang