Related papers: EFDiT: Efficient Fine-grained Image Generation Usi…

FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

The class-conditional image generation based on diffusion models is renowned for generating high-quality and diverse images. However, most prior efforts focus on generating images for general categories, e.g., 1000 classes in ImageNet-1k. A…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ziying Pan , Kun Wang , Gang Li , Feihong He , Yongxuan Lai

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

Diffusion models have recently received increasing research attention for their remarkable transfer abilities in semantic segmentation tasks. However, generating fine-grained segmentation masks with diffusion models often requires…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Koichi Namekata , Amirmojtaba Sabour , Sanja Fidler , Seung Wook Kim

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Latent diffusion models (LDMs) dominate high-quality image generation, yet integrating representation learning with generative modeling remains a challenge. We introduce a novel generative image modeling framework that seamlessly bridges…

Computer Vision and Pattern Recognition · Computer Science 2026-01-23 Theodoros Kouzelis , Efstathios Karypidis , Ioannis Kakogeorgiou , Spyros Gidaris , Nikos Komodakis

Local-Global Context-Aware and Structure-Preserving Image Super-Resolution

Diffusion models have recently achieved significant success in various image manipulation tasks, including image super-resolution and perceptual quality enhancement. Pretrained text-to-image models, such as Stable Diffusion, have exhibited…

Computer Vision and Pattern Recognition · Computer Science 2025-10-16 Sanchar Palit , Subhasis Chaudhuri , Biplab Banerjee

Fine-grained Appearance Transfer with Diffusion Models

Image-to-image translation (I2I), and particularly its subfield of appearance transfer, which seeks to alter the visual appearance between images while maintaining structural coherence, presents formidable challenges. Despite significant…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Yuteng Ye , Guanwen Li , Hang Zhou , Cai Jiale , Junqing Yu , Yawei Luo , Zikai Song , Qilong Xing , Youjia Zhang , Wei Yang

Fine-grained Defocus Blur Control for Generative Image Models

Current text-to-image diffusion models excel at generating diverse, high-quality images, yet they struggle to incorporate fine-grained camera metadata such as precise aperture settings. In this work, we introduce a novel text-to-image…

Computer Vision and Pattern Recognition · Computer Science 2025-12-15 Ayush Shrivastava , Connelly Barnes , Xuaner Zhang , Lingzhi Zhang , Andrew Owens , Sohrab Amirghodsi , Eli Shechtman

High-Resolution Image Synthesis with Latent Diffusion Models

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a…

Computer Vision and Pattern Recognition · Computer Science 2022-04-14 Robin Rombach , Andreas Blattmann , Dominik Lorenz , Patrick Esser , Björn Ommer

Fine-grained Semantic Constraint in Image Synthesis

In this paper, we propose a multi-stage and high-resolution model for image synthesis that uses fine-grained attributes and masks as input. With a fine-grained attribute, the proposed model can detailedly constrain the features of the…

Computer Vision and Pattern Recognition · Computer Science 2021-01-13 Pengyang Li , Donghui Wang

Fixed Point Diffusion Models

We introduce the Fixed Point Diffusion Model (FPDM), a novel approach to image generation that integrates the concept of fixed point solving into the framework of diffusion-based generative modeling. Our approach embeds an implicit fixed…

Computer Vision and Pattern Recognition · Computer Science 2024-01-18 Xingjian Bai , Luke Melas-Kyriazi

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks. However, diffusion models often struggle to produce images that accurately reflect the intended…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Yang Zhang , Teoh Tze Tzun , Lim Wei Hern , Tiviatis Sim , Kenji Kawaguchi

DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning

While diffusion models excel at image synthesis, useful representations have been shown to emerge from generative pre-training, suggesting a path towards unified generative and discriminative learning. However, suboptimal semantic flow…

Computer Vision and Pattern Recognition · Computer Science 2025-12-23 Weilai Xiang , Hongyu Yang , Di Huang , Yunhong Wang

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

In this work, we investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes. In addition, the generated images should have arbitrary image aspect ratios. When…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Yingqing He , Shaoshu Yang , Haoxin Chen , Xiaodong Cun , Menghan Xia , Yong Zhang , Xintao Wang , Ran He , Qifeng Chen , Ying Shan

Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Nithesh Chandher Karthikeyan , Jonas Unger , Gabriel Eilertsen

Extreme Video Compression with Pre-trained Diffusion Models

Diffusion models have achieved remarkable success in generating high quality image and video data. More recently, they have also been used for image compression with high perceptual quality. In this paper, we present a novel approach to…

Image and Video Processing · Electrical Eng. & Systems 2024-02-15 Bohan Li , Yiming Liu , Xueyan Niu , Bo Bai , Lei Deng , Deniz Gündüz

Stencil: Subject-Driven Generation with Context Guidance

Recent text-to-image diffusion models can generate striking visuals from text prompts, but they often fail to maintain subject consistency across generations and contexts. One major limitation of current fine-tuning approaches is the…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Gordon Chen , Ziqi Huang , Cheston Tan , Ziwei Liu

Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot Fine-grained Semantic Editing

Diffusion Transformers (DiTs) have achieved remarkable success in diverse and high-quality text-to-image(T2I) generation. However, how text and image latents individually and jointly contribute to the semantics of generated images, remain…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Zitao Shuai , Chenwei Wu , Zhengxu Tang , Bowen Song , Liyue Shen

High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity

In the realm of high-resolution (HR), fine-grained image segmentation, the primary challenge is balancing broad contextual awareness with the precision required for detailed object delineation, capturing intricate details and the finest…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Qian Yu , Peng-Tao Jiang , Hao Zhang , Jinwei Chen , Bo Li , Lihe Zhang , Huchuan Lu

Label-Efficient Semantic Segmentation with Diffusion Models

Denoising diffusion probabilistic models have recently received much research attention since they outperform alternative approaches, such as GANs, and currently provide state-of-the-art generative performance. The superior performance of…

Computer Vision and Pattern Recognition · Computer Science 2022-03-17 Dmitry Baranchuk , Ivan Rubachev , Andrey Voynov , Valentin Khrulkov , Artem Babenko

Enhancing Image Generation Fidelity via Progressive Prompts

The diffusion transformer (DiT) architecture has attracted significant attention in image generation, achieving better fidelity, performance, and diversity. However, most existing DiT - based image generation methods focus on global - aware…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Zhen Xiong , Yuqi Li , Chuanguang Yang , Tiao Tan , Zhihong Zhu , Siyuan Li , Yue Ma

Extreme Generative Image Compression by Learning Text Embedding from Diffusion Models

Transferring large amount of high resolution images over limited bandwidth is an important but very challenging task. Compressing images using extremely low bitrates (<0.1 bpp) has been studied but it often results in low quality images of…

Image and Video Processing · Electrical Eng. & Systems 2022-11-16 Zhihong Pan , Xin Zhou , Hao Tian