Related papers: Adding Conditional Control to Text-to-Image Diffus…

CCM: Adding Conditional Controls to Text-to-Image Consistency Models

Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality. However, the way to add new conditional controls to the pretrained CMs has not been explored. In this technical report, we consider…

Computer Vision and Pattern Recognition · Computer Science 2023-12-13 Jie Xiao , Kai Zhu , Han Zhang , Zhiheng Liu , Yujun Shen , Yu Liu , Xueyang Fu , Zheng-Jun Zha

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Shihao Zhao , Dongdong Chen , Yen-Chun Chen , Jianmin Bao , Shaozhe Hao , Lu Yuan , Kwan-Yee K. Wong

Heeding the Inner Voice: Aligning ControlNet Training via Intermediate Features Feedback

Despite significant progress in text-to-image diffusion models, achieving precise spatial control over generated outputs remains challenging. ControlNet addresses this by introducing an auxiliary conditioning module, while ControlNet++…

Computer Vision and Pattern Recognition · Computer Science 2025-07-04 Nina Konovalova , Maxim Nikolaev , Andrey Kuznetsov , Aibek Alanov

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-image (T2I) diffusion models. However, auxiliary modules have to be trained for each type of spatial condition, model architecture, and checkpoint,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-13 Sicheng Mo , Fangzhou Mu , Kuan Heng Lin , Yanli Liu , Bochen Guan , Yin Li , Bolei Zhou

ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems

The field of image synthesis has made tremendous strides forward in the last years. Besides defining the desired output image with text-prompts, an intuitive approach is to additionally use spatial guidance in form of an image, such as a…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Denis Zavadski , Johann-Friedrich Feiden , Carsten Rother

Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control

While text-to-image diffusion models can generate highquality images from textual descriptions, they generally lack fine-grained control over the visual composition of the generated images. Some recent works tackle this problem by training…

Computer Vision and Pattern Recognition · Computer Science 2024-02-22 Denis Lukovnikov , Asja Fischer

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

To enhance the controllability of text-to-image diffusion models, existing efforts like ControlNet incorporated image-based conditional controls. In this paper, we reveal that existing methods still face significant challenges in generating…

Computer Vision and Pattern Recognition · Computer Science 2024-11-20 Ming Li , Taojiannan Yang , Huafeng Kuang , Jie Wu , Zhaoning Wang , Xuefeng Xiao , Chen Chen

CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation

Recently, large-scale diffusion models have made impressive progress in text-to-image (T2I) generation. To further equip these T2I models with fine-grained spatial control, approaches like ControlNet introduce an extra network that learns…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Yifeng Xu , Zhenliang He , Shiguang Shan , Xilin Chen

Local Conditional Controlling for Text-to-Image Diffusion Models

Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level structure controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired…

Computer Vision and Pattern Recognition · Computer Science 2024-08-23 Yibo Zhao , Liang Peng , Yang Yang , Zekai Luo , Hengjia Li , Yao Chen , Zheng Yang , Xiaofei He , Wei Zhao , qinglin lu , Boxi Wu , Wei Liu

Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt

Text-to-image generation has witnessed great progress, especially with the recent advancements in diffusion models. Since texts cannot provide detailed conditions like object appearance, reference images are usually leveraged for the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Zhiqi Huang , Huixin Xiong , Haoyu Wang , Longguang Wang , Zhiheng Li

Active Learning Inspired ControlNet Guidance for Augmenting Semantic Segmentation Datasets

Recent advances in conditional image generation from diffusion models have shown great potential in achieving impressive image quality while preserving the constraints introduced by the user. In particular, ControlNet enables precise…

Computer Vision and Pattern Recognition · Computer Science 2025-03-13 Hannah Kniesel , Pedro Hermosilla , Timo Ropinski

Proportion and Perspective Control for Flow-Based Image Generation

While modern text-to-image diffusion models generate high-fidelity images, they offer limited control over the spatial and geometric structure of the output. To address this, we introduce and evaluate two ControlNets specialized for…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Julien Boudier , Hugo Caselles-Dupré

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

ControlNets are widely used for adding spatial control to text-to-image diffusion models with different conditions, such as depth maps, scribbles/sketches, and human poses. However, when it comes to controllable video generation,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-27 Han Lin , Jaemin Cho , Abhay Zala , Mohit Bansal

ECNet: Effective Controllable Text-to-Image Diffusion Models

The conditional text-to-image diffusion models have garnered significant attention in recent years. However, the precision of these models is often compromised mainly for two reasons, ambiguous condition input and inadequate condition…

Computer Vision and Pattern Recognition · Computer Science 2024-03-28 Sicheng Li , Keqiang Sun , Zhixin Lai , Xiaoshi Wu , Feng Qiu , Haoran Xie , Kazunori Miyata , Hongsheng Li

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

Text-conditional diffusion models are able to generate high-fidelity images with diverse contents. However, linguistic representations frequently exhibit ambiguous descriptions of the envisioned objective imagery, requiring the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-01 Minghui Hu , Jianbin Zheng , Daqing Liu , Chuanxia Zheng , Chaoyue Wang , Dacheng Tao , Tat-Jen Cham

Control3D: Towards Controllable Text-to-3D Generation

Recent remarkable advances in large-scale text-to-image diffusion models have inspired a significant breakthrough in text-to-3D generation, pursuing 3D content creation solely from a given text prompt. However, existing text-to-3D…

Computer Vision and Pattern Recognition · Computer Science 2023-11-10 Yang Chen , Yingwei Pan , Yehao Li , Ting Yao , Tao Mei

Scene Graph Conditioning in Latent Diffusion

Diffusion models excel in image generation but lack detailed semantic control using text prompts. Additional techniques have been developed to address this limitation. However, conditioning diffusion models solely on text-based descriptions…

Computer Vision and Pattern Recognition · Computer Science 2023-10-17 Frank Fundel

LOVECon: Text-driven Training-Free Long Video Editing with ControlNet

Leveraging pre-trained conditional diffusion models for video editing without further tuning has gained increasing attention due to its promise in film production, advertising, etc. Yet, seminal works in this line fall short in generation…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Zhenyi Liao , Zhijie Deng

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

Recently, the multimedia community has witnessed the rise of diffusion models trained on large-scale multi-modal data for visual content creation, particularly in the field of text-to-image generation. In this paper, we propose a new task…

Computer Vision and Pattern Recognition · Computer Science 2023-11-10 Jingwen Chen , Yingwei Pan , Ting Yao , Tao Mei

SemanticControl: A Training-Free Approach for Handling Loosely Aligned Visual Conditions in ControlNet

ControlNet has enabled detailed spatial control in text-to-image diffusion models by incorporating additional visual conditions such as depth or edge maps. However, its effectiveness heavily depends on the availability of visual conditions…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Woosung Joung , Daewon Chae , Jinkyu Kim