Related papers: ControlNet++: Improving Conditional Controls with …

Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback

Despite significant progress in text-to-image diffusion models, achieving precise spatial control over generated outputs remains challenging. ControlNet addresses this by introducing an auxiliary conditioning module, while ControlNet++…

Computer Vision and Pattern Recognition · Computer Science 2025-07-04 Nina Konovalova , Maxim Nikolaev , Andrey Kuznetsov , Aibek Alanov

CCM: Adding Conditional Controls to Text-to-Image Consistency Models

Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality. However, the way to add new conditional controls to the pretrained CMs has not been explored. In this technical report, we consider…

Computer Vision and Pattern Recognition · Computer Science 2023-12-13 Jie Xiao , Kai Zhu , Han Zhang , Zhiheng Liu , Yujun Shen , Yu Liu , Xueyang Fu , Zheng-Jun Zha

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation

To enhance the controllability of text-to-image diffusion models, current ControlNet-like models have explored various control signals to dictate image attributes. However, existing methods either handle conditions inefficiently or use a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Qingdong He , Jinlong Peng , Pengcheng Xu , Boyuan Jiang , Xiaobin Hu , Donghao Luo , Yong Liu , Yabiao Wang , Chengjie Wang , Xiangtai Li , Jiangning Zhang

ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems

The field of image synthesis has made tremendous strides forward in the last years. Besides defining the desired output image with text-prompts, an intuitive approach is to additionally use spatial guidance in form of an image, such as a…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Denis Zavadski , Johann-Friedrich Feiden , Carsten Rother

Adding Conditional Control to Text-to-Image Diffusion Models

We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Lvmin Zhang , Anyi Rao , Maneesh Agrawala

Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling

In this paper, we focus on the task of conditional image generation, where an image is synthesized according to user instructions. The critical challenge underpinning this task is ensuring both the fidelity of the generated images and their…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Guiyu Zhang , Huan-ang Gao , Zijian Jiang , Hao Zhao , Zhedong Zheng

Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control

While text-to-image diffusion models can generate highquality images from textual descriptions, they generally lack fine-grained control over the visual composition of the generated images. Some recent works tackle this problem by training…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Denis Lukovnikov , Asja Fischer

Minimal Impact ControlNet: Advancing Multi-ControlNet Integration

With the advancement of diffusion models, there is a growing demand for high-quality, controllable image generation, particularly through methods that utilize one or multiple control signals based on ControlNet. However, in current…

Machine Learning · Computer Science 2025-06-03 Shikun Sun , Min Zhou , Zixuan Wang , Xubin Li , Tiezheng Ge , Zijie Ye , Xiaoyu Qin , Junliang Xing , Bo Zheng , Jia Jia

Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets

Recent advances in conditional image generation from diffusion models have shown great potential in achieving impressive image quality while preserving the constraints introduced by the user. In particular, ControlNet enables precise…

Computer Vision and Pattern Recognition · Computer Science 2025-03-13 Hannah Kniesel , Pedro Hermosilla , Timo Ropinski

SemanticControl: A Training-Free Approach for Handling Loosely Aligned Visual Conditions in ControlNet

ControlNet has enabled detailed spatial control in text-to-image diffusion models by incorporating additional visual conditions such as depth or edge maps. However, its effectiveness heavily depends on the availability of visual conditions…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Woosung Joung , Daewon Chae , Jinkyu Kim

Local Conditional Controlling for Text-to-Image Diffusion Models

Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level structure controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired…

Computer Vision and Pattern Recognition · Computer Science 2024-08-23 Yibo Zhao , Liang Peng , Yang Yang , Zekai Luo , Hengjia Li , Yao Chen , Zheng Yang , Xiaofei He , Wei Zhao , qinglin lu , Boxi Wu , Wei Liu

ControlAR: Controllable Image Generation with Autoregressive Models

Autoregressive (AR) models have reformulated image generation as next-token prediction, demonstrating remarkable potential and emerging as strong competitors to diffusion models. However, control-to-image generation, akin to ControlNet,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Zongming Li , Tianheng Cheng , Shoufa Chen , Peize Sun , Haocheng Shen , Longjin Ran , Xiaoxin Chen , Wenyu Liu , Xinggang Wang

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Diffusion models have demonstrated remarkable and robust abilities in both image and video generation. To achieve greater control over generated results, researchers introduce additional architectures, such as ControlNet, Adapters and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Bohao Peng , Jian Wang , Yuechen Zhang , Wenbo Li , Ming-Chang Yang , Jiaya Jia

PixelPonder: Dynamic Patch Adaptation for Enhanced Multi-Conditional Text-to-Image Generation

Recent advances in diffusion-based text-to-image generation have demonstrated promising results through visual condition control. However, existing ControlNet-like methods struggle with compositional visual conditioning - simultaneously…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Yanjie Pan , Qingdong He , Zhengkai Jiang , Pengcheng Xu , Chaoyi Wang , Jinlong Peng , Haoxuan Wang , Yun Cao , Zhenye Gan , Mingmin Chi , Bo Peng , Yabiao Wang

ControlCom: Controllable Image Composition using Diffusion Model

Image composition targets at synthesizing a realistic composite image from a pair of foreground and background images. Recently, generative composition methods are built on large pretrained diffusion models to generate composite images,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Bo Zhang , Yuxuan Duan , Jun Lan , Yan Hong , Huijia Zhu , Weiqiang Wang , Li Niu

OmniControlNet: Dual-stage Integration for Conditional Image Generation

We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method and incorporating its individually trained image generation processes into a…

Computer Vision and Pattern Recognition · Computer Science 2024-06-11 Yilin Wang , Haiyang Xu , Xiang Zhang , Zeyuan Chen , Zhizhou Sha , Zirui Wang , Zhuowen Tu

Music ControlNet: Multiple Time-varying Controls for Music Generation

Text-to-music generation models are now capable of generating high-quality music audio in broad styles. However, text control is primarily suitable for the manipulation of global musical attributes like genre, mood, and tempo, and is less…

Sound · Computer Science 2023-11-14 Shih-Lun Wu , Chris Donahue , Shinji Watanabe , Nicholas J. Bryan

Compensation Sampling for Improved Convergence in Diffusion Models

Diffusion models achieve remarkable quality in image generation, but at a cost. Iterative denoising requires many time steps to produce high fidelity images. We argue that the denoising process is crucially limited by an accumulation of the…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Hui Lu , Albert ali Salah , Ronald Poppe

REFINE-CONTROL: A Semi-supervised Distillation Method For Conditional Image Generation

Conditional image generation models have achieved remarkable results by leveraging text-based control to generate customized images. However, the high resource demands of these models and the scarcity of well-annotated data have hindered…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Yicheng Jiang , Jin Yuan , Hua Yuan , Yao Zhang , Yong Rui

A Practical Investigation of Spatially-Controlled Image Generation with Transformers

Enabling image generation models to be spatially controlled is an important area of research, empowering users to better generate images according to their own fine-grained specifications via e.g. edge maps, poses. Although this task has…

Computer Vision and Pattern Recognition · Computer Science 2025-11-05 Guoxuan Xia , Harleen Hanspal , Petru-Daniel Tudosiu , Shifeng Zhang , Sarah Parisot