English
Related papers

Related papers: Compositional Visual Generation with Composable Di…

200 papers

Taking advantage of the many recent advances in deep learning, text-to-image generative models currently have the merit of attracting the general public attention. Two of these models, DALL-E 2 and Imagen, have demonstrated that highly…

Computer Vision and Pattern Recognition · Computer Science 2022-09-23 Robin Zbinden

Diffusion-based text-to-image generation models like GLIDE and DALLE-2 have gained wide success recently for their superior performance in turning complex text inputs into images of high quality and wide diversity. In particular, they are…

Computer Vision and Pattern Recognition · Computer Science 2022-11-16 Zhihong Pan , Xin Zhou , Hao Tian

Text-to-image diffusion models have demonstrated an impressive ability to produce high-quality outputs. However, they often struggle to accurately follow fine-grained spatial information in an input text. To this end, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Ran Galun , Sagie Benaim

While score based generative models, or diffusion models, have found success in image synthesis, they are often coupled with text data or image label to be able to manipulate and conditionally generate images. Even though manipulation of…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Sandesh Ghimire , Armand Comas , Davin Hill , Aria Masoomi , Octavia Camps , Jennifer Dy

Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Jocelin Su , Nan Liu , Yanbo Wang , Joshua B. Tenenbaum , Yilun Du

A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge. In this paper we show that energy-based models can exhibit this…

Computer Vision and Pattern Recognition · Computer Science 2020-12-18 Yilun Du , Shuang Li , Igor Mordatch

Deep generative models allow for photorealistic image synthesis at high resolutions. But for many applications, this is not enough: content creation also needs to be controllable. While several recent works investigate how to disentangle…

Computer Vision and Pattern Recognition · Computer Science 2021-04-30 Michael Niemeyer , Andreas Geiger

Large-scale generative models, such as text-to-image diffusion models, have garnered widespread attention across diverse domains due to their creative and high-fidelity image generation. Nonetheless, existing large-scale diffusion models…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Younghyun Kim , Geunmin Hwang , Junyu Zhang , Eunbyung Park

Denoising Diffusion models have shown remarkable performance in generating diverse, high quality images from text. Numerous techniques have been proposed on top of or in alignment with models like Stable Diffusion and Imagen that generate…

Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Fanyue Wei , Wei Zeng , Zhenyang Li , Dawei Yin , Lixin Duan , Wen Li

Recent advancements in text-to-image models, particularly diffusion models, have shown significant promise. However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately…

Computer Vision and Pattern Recognition · Computer Science 2023-10-11 Song Wen , Guian Fang , Renrui Zhang , Peng Gao , Hao Dong , Dimitris Metaxas

We argue that diffusion models' success in modeling complex distributions is, for the most part, coming from their input conditioning. This paper investigates the representation used to condition diffusion models from the perspective that…

Computer Vision and Pattern Recognition · Computer Science 2026-01-07 Samuel Lavoie , Michael Noukhovitch , Aaron Courville

Generative models have demonstrated remarkable abilities in generating high-fidelity visual content. In this work, we explore how generative models can further be used not only to synthesize visual content but also to understand the…

Computer Vision and Pattern Recognition · Computer Science 2025-06-25 Yanbo Wang , Justin Dauwels , Yilun Du

We present DiffCollage, a compositional diffusion model that can generate large content by leveraging diffusion models trained on generating pieces of the large content. Our approach is based on a factor graph representation where each…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Qinsheng Zhang , Jiaming Song , Xun Huang , Yongxin Chen , Ming-Yu Liu

Image composition targets at synthesizing a realistic composite image from a pair of foreground and background images. Recently, generative composition methods are built on large pretrained diffusion models to generate composite images,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Bo Zhang , Yuxuan Duan , Jun Lan , Yan Hong , Huijia Zhu , Weiqiang Wang , Li Niu

While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and…

Computer Vision and Pattern Recognition · Computer Science 2023-04-14 Severi Rissanen , Markus Heinonen , Arno Solin

Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm of using the diffusion model and classifier guidance in the latent semantic space…

Computer Vision and Pattern Recognition · Computer Science 2023-05-25 Changhao Shi , Haomiao Ni , Kai Li , Shaobo Han , Mingfu Liang , Martin Renqiang Min

Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they…

Machine Learning · Computer Science 2025-07-29 Maya Okawa , Ekdeep Singh Lubana , Robert P. Dick , Hidenori Tanaka

While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, particularly when users simultaneously specify text prompts, subject references,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Yusuf Dalva , Guocheng Gordon Qian , Maya Goldenberg , Tsai-Shien Chen , Kfir Aberman , Sergey Tulyakov , Pinar Yanardag , Kuan-Chieh Jackson Wang

Text-guided diffusion models such as DALLE-2, Imagen, eDiff-I, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images…

Computer Vision and Pattern Recognition · Computer Science 2023-09-27 Wan-Duo Kurt Ma , J. P. Lewis , Avisek Lahiri , Thomas Leung , W. Bastiaan Kleijn
‹ Prev 1 2 3 10 Next ›