Related papers: ControlCom: Controllable Image Composition using D…

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, particularly when users simultaneously specify text prompts, subject references,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Yusuf Dalva , Guocheng Gordon Qian , Maya Goldenberg , Tsai-Shien Chen , Kfir Aberman , Sergey Tulyakov , Pinar Yanardag , Kuan-Chieh Jackson Wang

DreamCom: Finetuning Text-guided Inpainting Model for Image Composition

The goal of image composition is merging a foreground object into a background image to obtain a realistic composite image. Recently, generative composition methods are built on large pretrained diffusion models, due to their unprecedented…

Computer Vision and Pattern Recognition · Computer Science 2024-01-25 Lingxiao Lu , Jiangtong Li , Bo Zhang , Li Niu

Making Images Real Again: A Comprehensive Survey on Deep Image Composition

As a common image editing operation, image composition (object insertion) aims to combine the foreground from one image and another background image, to produce a composite image. However, there are many issues that could make the composite…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Li Niu , Wenyan Cong , Liu Liu , Yan Hong , Bo Zhang , Jing Liang , Liqing Zhang

Composer: Creative and Controllable Image Synthesis with Composable Conditions

Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-23 Lianghua Huang , Di Chen , Yu Liu , Yujun Shen , Deli Zhao , Jingren Zhou

Image Harmonization with Diffusion Model

Image composition in image editing involves merging a foreground image with a background image to create a composite. Inconsistent lighting conditions between the foreground and background often result in unrealistic composites. Image…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Jiajie Li , Jian Wang , Chen Wang , Jinjun Xiong

ControlMat: A Controlled Generative Approach to Material Capture

Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks.…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Giuseppe Vecchio , Rosalie Martin , Arthur Roullier , Adrien Kaiser , Romain Rouffet , Valentin Deschaintre , Tamy Boubekeur

Controllable Coupled Image Generation via Diffusion Models

We provide an attention-level control method for the task of coupled image generation, where "coupled" means that multiple simultaneously generated images are expected to have the same or very similar backgrounds. While backgrounds coupled,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Chenfei Yuan , Nanshan Jia , Hangqi Li , Peter W. Glynn , Zeyu Zheng

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Omer Bar-Tal , Lior Yariv , Yaron Lipman , Tali Dekel

FilterPrompt: A Simple yet Efficient Approach to Guide Image Appearance Transfer in Diffusion Models

In controllable generation tasks, flexibly manipulating the generated images to attain a desired appearance or structure based on a single input image cue remains a critical and longstanding challenge. Achieving this requires the effective…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Xi Wang , Yichen Peng , Heng Fang , Yilin Wang , Haoran Xie , Xi Yang , Chuntao Li

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image. Rather than concentrating on specific use cases such as appearance editing (image harmonization) or semantic editing…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Zhekai Chen , Wen Wang , Zhen Yang , Zeqing Yuan , Hao Chen , Chunhua Shen

Composite Diffusion | whole >= \Sigma parts

For an artist or a graphic designer, the spatial layout of a scene is a critical design choice. However, existing text-to-image diffusion models provide limited support for incorporating spatial information. This paper introduces Composite…

Computer Vision and Pattern Recognition · Computer Science 2023-07-27 Vikram Jamwal , Ramaneswaran S

CareCom: Generative Image Composition with Calibrated Reference Features

Image composition aims to seamlessly insert foreground object into background. Despite the huge progress in generative image composition, the existing methods are still struggling with simultaneous detail preservation and foreground…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Jiaxuan Chen , Bo Zhang , Qingdong He , Jinlong Peng , Li Niu

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

Recently, the multimedia community has witnessed the rise of diffusion models trained on large-scale multi-modal data for visual content creation, particularly in the field of text-to-image generation. In this paper, we propose a new task…

Computer Vision and Pattern Recognition · Computer Science 2023-11-10 Jingwen Chen , Yingwei Pan , Ting Yao , Tao Mei

Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Nithesh Chandher Karthikeyan , Jonas Unger , Gabriel Eilertsen

PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering

Image composition involves seamlessly integrating given objects into a specific visual context. Current training-free methods rely on composing attention weights from several samplers to guide the generator. However, since these weights are…

Computer Vision and Pattern Recognition · Computer Science 2024-08-21 Yibin Wang , Weizhong Zhang , Jianwei Zheng , Cheng Jin

Composing Parts for Expressive Object Generation

Image composition and generation are processes where the artists need control over various parts of the generated images. However, the current state-of-the-art generation models, like Stable Diffusion, cannot handle fine-grained part-level…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Harsh Rangwani , Aishwarya Agarwal , Kuldeep Kulkarni , R. Venkatesh Babu , Srikrishna Karanam

Compositional Image Decomposition with Diffusion Models

Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Jocelin Su , Nan Liu , Yanbo Wang , Joshua B. Tenenbaum , Yilun Du

Diffusion Self-Guidance for Controllable Image Generation

Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that…

Computer Vision and Pattern Recognition · Computer Science 2023-06-13 Dave Epstein , Allan Jabri , Ben Poole , Alexei A. Efros , Aleksander Holynski

Thinking Outside the BBox: Unconstrained Generative Object Compositing

Compositing an object into an image involves multiple non-trivial sub-tasks such as object placement and scaling, color/lighting harmonization, viewpoint/geometry adjustment, and shadow/reflection generation. Recent generative image…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Gemma Canet Tarrés , Zhe Lin , Zhifei Zhang , Jianming Zhang , Yizhi Song , Dan Ruta , Andrew Gilbert , John Collomosse , Soo Ye Kim

Composition and Alignment of Diffusion Models using Constrained Learning

Diffusion models have become prevalent in generative modeling due to their ability to sample from complex distributions. To improve the quality of generated samples and their compliance with user requirements, two commonly used methods are:…

Machine Learning · Computer Science 2025-12-01 Shervin Khalafi , Ignacio Hounie , Dongsheng Ding , Alejandro Ribeiro