Related papers: Compositional Visual Generation with Composable Di…

Implementing and Experimenting with Diffusion Models for Text-to-Image Generation

Taking advantage of the many recent advances in deep learning, text-to-image generative models currently have the merit of attracting the general public attention. Two of these models, DALL-E 2 and Imagen, have demonstrated that highly…

Computer Vision and Pattern Recognition · Computer Science 2022-09-23 Robin Zbinden

Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation

Diffusion-based text-to-image generation models like GLIDE and DALLE-2 have gained wide success recently for their superior performance in turning complex text inputs into images of high quality and wide diversity. In particular, they are…

Computer Vision and Pattern Recognition · Computer Science 2022-11-16 Zhihong Pan , Xin Zhou , Hao Tian

Generating Intermediate Representations for Compositional Text-To-Image Generation

Text-to-image diffusion models have demonstrated an impressive ability to produce high-quality outputs. However, they often struggle to accurately follow fine-grained spatial information in an input text. To this end, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Ran Galun , Sagie Benaim

Divide and Compose with Score Based Generative Models

While score based generative models, or diffusion models, have found success in image synthesis, they are often coupled with text data or image label to be able to manipulate and conditionally generate images. Even though manipulation of…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Sandesh Ghimire , Armand Comas , Davin Hill , Aria Masoomi , Octavia Camps , Jennifer Dy

Compositional Image Decomposition with Diffusion Models

Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Jocelin Su , Nan Liu , Yanbo Wang , Joshua B. Tenenbaum , Yilun Du

Compositional Visual Generation and Inference with Energy Based Models

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge. In this paper we show that energy-based models can exhibit this…

Computer Vision and Pattern Recognition · Computer Science 2020-12-18 Yilun Du , Shuang Li , Igor Mordatch

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Deep generative models allow for photorealistic image synthesis at high resolutions. But for many applications, this is not enough: content creation also needs to be controllable. While several recent works investigate how to disentangle…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Michael Niemeyer , Andreas Geiger

DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

Large-scale generative models, such as text-to-image diffusion models, have garnered widespread attention across diverse domains due to their creative and high-fidelity image generation. Nonetheless, existing large-scale diffusion models…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Younghyun Kim , Geunmin Hwang , Junyu Zhang , Eunbyung Park

Controlled and Conditional Text to Image Generation with Diffusion Prior

Denoising Diffusion models have shown remarkable performance in generating diverse, high quality images from text. Numerous techniques have been proposed on top of or in alignment with models like Stable Diffusion and Imagen that generate…

Computer Vision and Pattern Recognition · Computer Science 2023-08-02 Pranav Aggarwal , Hareesh Ravi , Naveen Marri , Sachin Kelkar , Fengbin Chen , Vinh Khuc , Midhun Harikumar , Ritiz Tambi , Sudharshan Reddy Kakumanu , Purvak Lapsiya , Alvin Ghouas , Sarah Saber , Malavika Ramprasad , Baldo Faieta , Ajinkya Kale

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Fanyue Wei , Wei Zeng , Zhenyang Li , Dawei Yin , Lixin Duan , Wen Li

Improving Compositional Text-to-image Generation with Large Vision-Language Models

Recent advancements in text-to-image models, particularly diffusion models, have shown significant promise. However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately…

Computer Vision and Pattern Recognition · Computer Science 2023-10-11 Song Wen , Guian Fang , Renrui Zhang , Peng Gao , Hao Dong , Dimitris Metaxas

Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models

We argue that diffusion models' success in modeling complex distributions is, for the most part, coming from their input conditioning. This paper investigates the representation used to condition diffusion models from the perspective that…

Computer Vision and Pattern Recognition · Computer Science 2026-01-07 Samuel Lavoie , Michael Noukhovitch , Aaron Courville

Compositional Scene Understanding through Inverse Generative Modeling

Generative models have demonstrated remarkable abilities in generating high-fidelity visual content. In this work, we explore how generative models can further be used not only to synthesize visual content but also to understand the…

Computer Vision and Pattern Recognition · Computer Science 2025-06-25 Yanbo Wang , Justin Dauwels , Yilun Du

DiffCollage: Parallel Generation of Large Content with Diffusion Models

We present DiffCollage, a compositional diffusion model that can generate large content by leveraging diffusion models trained on generating pieces of the large content. Our approach is based on a factor graph representation where each…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Qinsheng Zhang , Jiaming Song , Xun Huang , Yongxin Chen , Ming-Yu Liu

ControlCom: Controllable Image Composition using Diffusion Model

Image composition targets at synthesizing a realistic composite image from a pair of foreground and background images. Recently, generative composition methods are built on large pretrained diffusion models to generate composite images,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Bo Zhang , Yuxuan Duan , Jun Lan , Yan Hong , Huijia Zhu , Weiqiang Wang , Li Niu

Generative Modelling With Inverse Heat Dissipation

While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and…

Computer Vision and Pattern Recognition · Computer Science 2023-04-14 Severi Rissanen , Markus Heinonen , Arno Solin

Exploring Compositional Visual Generation with Latent Classifier Guidance

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space…

Computer Vision and Pattern Recognition · Computer Science 2023-05-25 Changhao Shi , Haomiao Ni , Kai Li , Shaobo Han , Mingfu Liang , Martin Renqiang Min

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they…

Machine Learning · Computer Science 2025-07-29 Maya Okawa , Ekdeep Singh Lubana , Robert P. Dick , Hidenori Tanaka

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, particularly when users simultaneously specify text prompts, subject references,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Yusuf Dalva , Guocheng Gordon Qian , Maya Goldenberg , Tsai-Shien Chen , Kfir Aberman , Sergey Tulyakov , Pinar Yanardag , Kuan-Chieh Jackson Wang

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Text-guided diffusion models such as DALLE-2, Imagen, eDiff-I, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images…

Computer Vision and Pattern Recognition · Computer Science 2023-09-27 Wan-Duo Kurt Ma , J. P. Lewis , Avisek Lahiri , Thomas Leung , W. Bastiaan Kleijn