English
Related papers

Related papers: Diffusion Model as a Generalist Segmentation Learn…

200 papers

The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Jinglong Wang , Xiawei Li , Jing Zhang , Qingyuan Xu , Qin Zhou , Qian Yu , Lu Sheng , Dong Xu

Foundation models have exhibited unprecedented capabilities in tackling many domains and tasks. Models such as CLIP are currently widely used to bridge cross-modal representations, and text-to-image diffusion models are arguably the leading…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Barbara Toniella Corradini , Mustafa Shukor , Paul Couairon , Guillaume Couairon , Franco Scarselli , Matthieu Cord

The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i.e., simultaneously generating images and segmentation masks for the corresponding…

Computer Vision and Pattern Recognition · Computer Science 2023-08-11 Ziyi Li , Qinye Zhou , Xiaoyun Zhang , Ya Zhang , Yanfeng Wang , Weidi Xie

Learning from a large corpus of data, pre-trained models have achieved impressive progress nowadays. As popular generative pre-training, diffusion models capture both low-level visual knowledge and high-level semantic relations. In this…

Computer Vision and Pattern Recognition · Computer Science 2023-03-20 Chaofan Ma , Yuhuan Yang , Chen Ju , Fei Zhang , Jinxiang Liu , Yu Wang , Ya Zhang , Yanfeng Wang

Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Zishen Qu , Xuesong Li , Haijian Gu , Hongwei Kang , Quan Meng , Tianrui Niu , Xin Yang , Ruidong Pan

Pre-trained diffusion models have demonstrated remarkable proficiency in synthesizing images across a wide range of scenarios with customizable prompts, indicating their effective capacity to capture universal features. Motivated by this,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-22 Yuxiang Ji , Boyong He , Chenyuan Qu , Zhuoyue Tan , Chuan Qin , Liaoni Wu

The Diffusion Model has not only garnered noteworthy achievements in the realm of image generation but has also demonstrated its potential as an effective pretraining method utilizing unlabeled data. Drawing from the extensive potential…

Computer Vision and Pattern Recognition · Computer Science 2024-10-30 Muzhi Zhu , Yang Liu , Zekai Luo , Chenchen Jing , Hao Chen , Guangkai Xu , Xinlong Wang , Chunhua Shen

In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Xiaoyu Zhu , Hao Zhou , Pengfei Xing , Long Zhao , Hao Xu , Junwei Liang , Alexander Hauptmann , Ting Liu , Andrew Gallagher

Diffusion Probabilistic Models (DPMs) have demonstrated significant potential in 3D medical image segmentation tasks. However, their high computational cost and inability to fully capture global 3D contextual information limit their…

Image and Video Processing · Electrical Eng. & Systems 2025-04-17 Kangbo Ma

Language-driven image segmentation is a fundamental task in vision-language understanding, requiring models to segment regions of an image corresponding to natural language expressions. Traditional methods approach this as a discriminative…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Yuhao Chen , Shubin Chen , Liang Lin , Guangrun Wang

While originally designed for image generation, diffusion models have recently shown to provide excellent pretrained feature representations for semantic segmentation. Intrigued by this result, we set out to explore how well…

Computer Vision and Pattern Recognition · Computer Science 2023-07-06 Rui Gong , Martin Danelljan , Han Sun , Julio Delgado Mangas , Luc Van Gool

Text-to-image diffusion models excel at translating language prompts into photorealistic images by implicitly grounding textual concepts through their cross-modal attention mechanisms. Recent multi-modal diffusion transformers extend this…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Chaehyun Kim , Heeseong Shin , Eunbeen Hong , Heeji Yoon , Anurag Arnab , Paul Hongsuck Seo , Sunghwan Hong , Seungryong Kim

Denoising diffusion probabilistic models have recently received much research attention since they outperform alternative approaches, such as GANs, and currently provide state-of-the-art generative performance. The superior performance of…

Computer Vision and Pattern Recognition · Computer Science 2022-03-17 Dmitry Baranchuk , Ivan Rubachev , Andrey Voynov , Valentin Khrulkov , Artem Babenko

Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of…

Computer Vision and Pattern Recognition · Computer Science 2024-09-26 Changyou Chen , Han Ding , Bunyamin Sisman , Yi Xu , Ouye Xie , Benjamin Z. Yao , Son Dinh Tran , Belinda Zeng

Recent advances in deep learning have shown that learning robust feature representations is critical for the success of many computer vision tasks, including medical image segmentation. In particular, both transformer and…

Computer Vision and Pattern Recognition · Computer Science 2025-02-03 David Li , Anvar Kurmukov , Mikhail Goncharov , Roman Sokolov , Mikhail Belyaev

The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images. In semantic segmentation, one promising approach is extracting pseudo-masks from attention maps in…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Ryota Yoshihashi , Yuya Otsuka , Kenji Doi , Tomohiro Tanaka , Hirokatsu Kataoka

This paper introduces an approach, named DFormer, for universal image segmentation. The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model. DFormer first adds various levels of Gaussian…

Computer Vision and Pattern Recognition · Computer Science 2023-06-09 Hefeng Wang , Jiale Cao , Rao Muhammad Anwer , Jin Xie , Fahad Shahbaz Khan , Yanwei Pang

Remote sensing semantic segmentation must address both what the ground objects are within an image and where they are located. Consequently, segmentation models must ensure not only the semantic correctness of large-scale patches…

Computer Vision and Pattern Recognition · Computer Science 2026-01-28 Hao Wang , Keyan Hu , Xin Guo , Haifeng Li , Chao Tao

Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2024-07-02 Yue Fan , Yongqin Xian , Xiaohua Zhai , Alexander Kolesnikov , Muhammad Ferjad Naeem , Bernt Schiele , Federico Tombari

Diffusion models have gained tremendous success in text-to-image generation, yet still lag behind with visual understanding tasks, an area dominated by autoregressive vision-language models. We propose a large-scale and fully end-to-end…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Zijie Li , Henry Li , Yichun Shi , Amir Barati Farimani , Yuval Kluger , Linjie Yang , Peng Wang
‹ Prev 1 2 3 10 Next ›