Related papers: Image Generation from Layout

Context-Aware Layout to Image Generation with Enhanced Object Appearance

A layout to image (L2I) generation model aims to generate a complicated image containing multiple objects (things) against natural background (stuff), conditioned on a given layout. Built upon the recent advances in generative adversarial…

Computer Vision and Pattern Recognition · Computer Science 2021-03-23 Sen He , Wentong Liao , Michael Ying Yang , Yongxin Yang , Yi-Zhe Song , Bodo Rosenhahn , Tao Xiang

Attribute-guided image generation from layout

Recent approaches have achieved great success in image generation from structured inputs, e.g., semantic segmentation, scene graph or layout. Although these methods allow specification of objects and their locations at image-level, they…

Computer Vision and Pattern Recognition · Computer Science 2020-08-28 Ke Ma , Bo Zhao , Leonid Sigal

Object-Centric Image Generation from Layouts

Despite recent impressive results on single-object and single-domain image generation, the generation of complex scenes with multiple objects remains challenging. In this paper, we start with the idea that a model must be able to understand…

Computer Vision and Pattern Recognition · Computer Science 2020-12-04 Tristan Sylvain , Pengchuan Zhang , Yoshua Bengio , R Devon Hjelm , Shikhar Sharma

LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation

In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it possible to generate rich kinds of novel photorealistic images. However, current models still face misalignment issues (e.g., problematic spatial…

Computer Vision and Pattern Recognition · Computer Science 2023-08-15 Leigang Qu , Shengqiong Wu , Hao Fei , Liqiang Nie , Tat-Seng Chua

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts. Despite the advancement, these diffusion models sometimes struggle to…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Xiaohui Chen , Yongfei Liu , Yingxiang Yang , Jianbo Yuan , Quanzeng You , Li-Ping Liu , Hongxia Yang

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects. While excelling in…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Hanan Gani , Shariq Farooq Bhat , Muzammal Naseer , Salman Khan , Peter Wonka

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation

The task of layout-to-image generation involves synthesizing images based on the captions of objects and their spatial positions. Existing methods still struggle in complex layout generation, where common bad cases include object missing,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-21 Bo Cheng , Yuhang Ma , Liebucha Wu , Shanyuan Liu , Ao Ma , Xiaoyu Wu , Dawei Leng , Yuhui Yin

Image Generation from Scene Graphs

To truly understand the visual world our models should be able not only to recognize images but also generate them. To this end, there has been exciting recent progress on generating images from natural language descriptions. These methods…

Computer Vision and Pattern Recognition · Computer Science 2018-04-06 Justin Johnson , Agrim Gupta , Li Fei-Fei

DivCon: Divide and Conquer for Complex Numerical and Spatial Reasoning in Text-to-Image Generation

Diffusion-driven text-to-image (T2I) generation has achieved remarkable advancements in recent years. To further improve T2I models' capability in numerical and spatial reasoning, layout is employed as an intermedium to bridge large…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Yuhao Jia , Wenhan Tan

Prim2Room: Layout-Controllable Room Mesh Generation from Primitives

We propose Prim2Room, a novel framework for controllable room mesh generation leveraging 2D layout conditions and 3D primitive retrieval to facilitate precise 3D layout specification. Diverging from existing methods that lack control and…

Computer Vision and Pattern Recognition · Computer Science 2024-09-10 Chengzeng Feng , Jiacheng Wei , Cheng Chen , Yang Li , Pan Ji , Fayao Liu , Hongdong Li , Guosheng Lin

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Attaining a high degree of user controllability in visual generation often requires intricate, fine-grained inputs like layouts. However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Weixi Feng , Wanrong Zhu , Tsu-jui Fu , Varun Jampani , Arjun Akula , Xuehai He , Sugato Basu , Xin Eric Wang , William Yang Wang

Obtaining Favorable Layouts for Multiple Object Generation

Large-scale text-to-image models that can generate high-quality and diverse images based on textual prompts have shown remarkable success. These models aim ultimately to create complex scenes, and addressing the challenge of multi-subject…

Computer Vision and Pattern Recognition · Computer Science 2024-05-03 Barak Battash , Amit Rozner , Lior Wolf , Ofir Lindenbaum

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation

Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing. A specialized area within generative…

Computer Vision and Pattern Recognition · Computer Science 2025-01-14 Jiaxin Cheng , Zixu Zhao , Tong He , Tianjun Xiao , Yicong Zhou , Zheng Zhang

OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts

Generating captions for images is a task that has recently received considerable attention. In this work we focus on caption generation for abstract scenes, or object layouts where the only information provided is a set of objects and their…

Computer Vision and Pattern Recognition · Computer Science 2017-07-25 Xuwang Yin , Vicente Ordonez

Constrained Graphic Layout Generation via Latent Optimization

It is common in graphic design humans visually arrange various elements according to their design intent and semantics. For example, a title text almost always appears on top of other elements in a document. In this work, we generate…

Computer Vision and Pattern Recognition · Computer Science 2021-08-03 Kotaro Kikuchi , Edgar Simo-Serra , Mayu Otani , Kota Yamaguchi

Unifying Layout Generation with a Decoupled Diffusion Model

Layout generation aims to synthesize realistic graphic scenes consisting of elements with different attributes including category, size, position, and between-element relation. It is a crucial task for reducing the burden on heavy-duty…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Mude Hui , Zhizheng Zhang , Xiaoyi Zhang , Wenxuan Xie , Yuwang Wang , Yan Lu

RealCamo: Boosting Real Camouflage Synthesis with Layout Controls and Textual-Visual Guidance

Camouflaged image generation (CIG) has recently emerged as an efficient alternative for acquiring high-quality training data for camouflaged object detection (COD). However, existing CIG methods still suffer from a substantial gap to real…

Computer Vision and Pattern Recognition · Computer Science 2026-01-16 Chunyuan Chen , Yunuo Cai , Shujuan Li , Weiyun Liang , Bin Wang , Jing Xu

Attribute2Image: Conditional Image Generation from Visual Attributes

This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be…

Machine Learning · Computer Science 2016-10-11 Xinchen Yan , Jimei Yang , Kihyuk Sohn , Honglak Lee

Generating Images with Multimodal Language Models

We propose a method to fuse frozen text-only large language models (LLMs) with pre-trained image encoder and decoder models, by mapping between their embedding spaces. Our model demonstrates a wide suite of multimodal capabilities: image…

Computation and Language · Computer Science 2023-10-16 Jing Yu Koh , Daniel Fried , Ruslan Salakhutdinov

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

Recently, diffusion models have achieved great success in image synthesis. However, when it comes to the layout-to-image generation where an image often has a complex scene of multiple objects, how to make strong control over both the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-13 Guangcong Zheng , Xianpan Zhou , Xuewei Li , Zhongang Qi , Ying Shan , Xi Li