Related papers: Enhancing Image Layout Control with Loss-Guided Di…

Training-Free Layout Control with Cross-Attention Guidance

Recent diffusion-based generators can produce high-quality images from textual prompts. However, they often disregard textual instructions that specify the spatial layout of the composition. We propose a simple approach that achieves robust…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Minghao Chen , Iro Laina , Andrea Vedaldi

Dense Text-to-Image Generation with Attention Modulation

Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a…

Computer Vision and Pattern Recognition · Computer Science 2023-08-25 Yunji Kim , Jiyoung Lee , Jin-Hwa Kim , Jung-Woo Ha , Jun-Yan Zhu

Design Booster: A Text-Guided Diffusion Model for Image Translation with Spatial Layout Preservation

Diffusion models are able to generate photorealistic images in arbitrary scenes. However, when applying diffusion models to image translation, there exists a trade-off between maintaining spatial structure and high-quality content. Besides,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Shiqi Sun , Shancheng Fang , Qian He , Wei Liu

STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation

In layout-to-image (L2I) synthesis, controlled complex scenes are generated from coarse information like bounding boxes. Such a task is exciting to many downstream applications because the input layouts offer strong guidance to the…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Ruyu Wang , Xuefeng Hou , Sabrina Schmedding , Marco F. Huber

A Comprehensive Survey on Diffusion Models and Their Applications

Diffusion Models are probabilistic models that create realistic samples by simulating the diffusion process, gradually adding and removing noise from data. These models have gained popularity in domains such as image processing, speech…

Computer Vision and Pattern Recognition · Computer Science 2024-08-21 Md Manjurul Ahsan , Shivakumar Raman , Yingtao Liu , Zahed Siddique

Move Anything with Layered Scene Diffusion

Diffusion models generate images with an unprecedented level of quality, but how can we freely rearrange image layouts? Recent works generate controllable scenes via learning spatially disentangled latent codes, but these methods do not…

Computer Vision and Pattern Recognition · Computer Science 2024-04-11 Jiawei Ren , Mengmeng Xu , Jui-Chieh Wu , Ziwei Liu , Tao Xiang , Antoine Toisoul

Guiding a Diffusion Model with a Bad Version of Itself

The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular…

Computer Vision and Pattern Recognition · Computer Science 2024-12-20 Tero Karras , Miika Aittala , Tuomas Kynkäänniemi , Jaakko Lehtinen , Timo Aila , Samuli Laine

Localized Control in Diffusion Models via Latent Vector Prediction

Diffusion models emerged as a leading approach in text-to-image generation, producing high-quality images from textual descriptions. However, attempting to achieve detailed control to get a desired image solely through text remains a…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Pablo Domingo-Gregorio , Javier Ruiz-Hidalgo

Optical Diffusion Models for Image Generation

Diffusion models generate new samples by progressively decreasing the noise from the initially provided random distribution. This inference procedure generally utilizes a trained neural network numerous times to obtain the final output,…

Optics · Physics 2024-11-01 Ilker Oguz , Niyazi Ulas Dinc , Mustafa Yildirim , Junjie Ke , Innfarn Yoo , Qifei Wang , Feng Yang , Christophe Moser , Demetri Psaltis

Guided Image Synthesis via Initial Image Editing in Diffusion Model

Diffusion models have the ability to generate high quality images by denoising pure Gaussian noise images. While previous research has primarily focused on improving the control of image generation through adjusting the denoising process,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Jiafeng Mao , Xueting Wang , Kiyoharu Aizawa

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Xuehai He , Weixi Feng , Tsu-Jui Fu , Varun Jampani , Arjun Akula , Pradyumna Narayana , Sugato Basu , William Yang Wang , Xin Eric Wang

Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Nithesh Chandher Karthikeyan , Jonas Unger , Gabriel Eilertsen

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Text-guided diffusion models such as DALLE-2, Imagen, eDiff-I, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images…

Computer Vision and Pattern Recognition · Computer Science 2023-09-27 Wan-Duo Kurt Ma , J. P. Lewis , Avisek Lahiri , Thomas Leung , W. Bastiaan Kleijn

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

Diffusion models are capable of generating impressive images conditioned on text descriptions, and extensions of these models allow users to edit images at a relatively coarse scale. However, the ability to precisely edit the layout,…

Computer Vision and Pattern Recognition · Computer Science 2024-02-01 Daniel Geng , Andrew Owens

DiffLoss: unleashing diffusion model as constraint for training image restoration network

Image restoration aims to enhance low quality images, producing high quality images that exhibit natural visual characteristics and fine semantic attributes. Recently, the diffusion model has emerged as a powerful technique for image…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Jiangtong Tan , Feng Zhao

LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models

Creating graphic layouts is a fundamental step in graphic designs. In this work, we present a novel generative model named LayoutDiffusion for automatic layout generation. As layout is typically represented as a sequence of discrete tokens,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-16 Junyi Zhang , Jiaqi Guo , Shizhao Sun , Jian-Guang Lou , Dongmei Zhang

Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods

As Diffusion Models have shown promising performance, a lot of efforts have been made to improve the controllability of Diffusion Models. However, how to train Diffusion Models to have the disentangled latent spaces and how to naturally…

Computer Vision and Pattern Recognition · Computer Science 2025-07-02 Wonwoong Cho , Hareesh Ravi , Midhun Harikumar , Vinh Khuc , Krishna Kumar Singh , Jingwan Lu , David I. Inouye , Ajinkya Kale

Upsample Guidance: Scale Up Diffusion Models without Training

Diffusion models have demonstrated superior performance across various generative tasks including images, videos, and audio. However, they encounter difficulties in directly generating high-resolution samples. Previously proposed solutions…

Computer Vision and Pattern Recognition · Computer Science 2024-04-03 Juno Hwang , Yong-Hyun Park , Junghyo Jo

Training-Free Layout-to-Image Generation with Marginal Attention Constraints

Recently, many text-to-image diffusion models have excelled at generating high-resolution images from text but struggle with precise control over spatial composition and object counting. To address these challenges, prior works have…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Huancheng Chen , Jingtao Li , Weiming Zhuang , Haris Vikalo , Lingjuan Lyu

On Conditioning the Input Noise for Controlled Image Generation with Diffusion Models

Conditional image generation has paved the way for several breakthroughs in image editing, generating stock photos and 3-D object generation. This continues to be a significant area of interest with the rise of new state-of-the-art methods…

Computer Vision and Pattern Recognition · Computer Science 2022-05-10 Vedant Singh , Surgan Jandial , Ayush Chopra , Siddharth Ramesh , Balaji Krishnamurthy , Vineeth N. Balasubramanian