Related papers: Language-driven Scene Synthesis using Multi-condit…

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Recently, diffusion-based image generation methods are credited for their remarkable text-to-image generation capabilities, while still facing challenges in accurately generating multilingual scene text images. To tackle this problem, we…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Lingjun Zhang , Xinyuan Chen , Yaohui Wang , Yue Lu , Yu Qiao

Functional 3D Scene Synthesis through Human-Scene Optimization

This paper presents a novel generative approach that outputs 3D indoor environments solely from a textual description of the scene. Current methods often treat scene synthesis as a mere layout prediction task, leading to rooms with…

Machine Learning · Computer Science 2025-02-12 Yao Wei , Matteo Toso , Pietro Morerio , Michael Ying Yang , Alessio Del Bue

Autonomous Character-Scene Interaction Synthesis from Text Instruction

Synthesizing human motions in 3D environments, particularly those with complex activities such as locomotion, hand-reaching, and human-object interaction, presents substantial demands for user-defined waypoints and stage transitions. These…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Nan Jiang , Zimo He , Zi Wang , Hongjie Li , Yixin Chen , Siyuan Huang , Yixin Zhu

Conditional Image Synthesis with Diffusion Models: A Survey

Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Zheyuan Zhan , Defang Chen , Jian-Ping Mei , Zhenghe Zhao , Jiawei Chen , Chun Chen , Siwei Lyu , Can Wang

DreamText: High Fidelity Scene Text Synthesis

Scene text synthesis involves rendering specified texts onto arbitrary images. Current methods typically formulate this task in an end-to-end manner but lack effective character-level guidance during training. Besides, their text encoders,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Yibin Wang , Weizhong Zhang , Honghui Xu , Cheng Jin

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Ling Fu , Zijie Wu , Yingying Zhu , Yuliang Liu , Xiang Bai

Layout Agnostic Scene Text Image Synthesis with Diffusion Models

While diffusion models have significantly advanced the quality of image generation their capability to accurately and coherently render text within these images remains a substantial challenge. Conventional diffusion-based methods for scene…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Qilong Zhangli , Jindong Jiang , Di Liu , Licheng Yu , Xiaoliang Dai , Ankit Ramchandani , Guan Pang , Dimitris N. Metaxas , Praveen Krishnan

Scene-aware Generative Network for Human Motion Synthesis

We revisit human motion synthesis, a task useful in various real world applications, in this paper. Whereas a number of methods have been developed previously for this task, they are often limited in two aspects: focusing on the poses while…

Computer Vision and Pattern Recognition · Computer Science 2021-06-01 Jingbo Wang , Sijie Yan , Bo Dai , Dahua LIn

Self-Prompting Diffusion Transformer for Open-Vocabulary Scene Text Editing via In-Context Learning

Scene text editing aims to modify text in a target region of an image while preserving surrounding background style and texture. Existing methods rely solely on image background information while neglecting the visual details of target…

Computer Vision and Pattern Recognition · Computer Science 2026-05-28 Hongxi Li , Tong Wang , Chengjing Wu , Tianbao Liu , Jiangtao Yao , Xiaochao Qu , Xinxiao Wu , Luoqi Liu , Ting Liu

Sketch-Guided Scene Image Generation

Text-to-image models are showcasing the impressive ability to create high-quality and diverse generative images. Nevertheless, the transition from freehand sketches to complex scene images remains challenging using diffusion models. In this…

Computer Vision and Pattern Recognition · Computer Science 2024-07-10 Tianyu Zhang , Xiaoxuan Xie , Xusheng Du , Haoran Xie

Compositional 3D Scene Generation using Locally Conditioned Diffusion

Designing complex 3D scenes has been a tedious, manual process requiring domain expertise. Emerging text-to-3D generative models show great promise for making this task more intuitive, but existing approaches are limited to object-level…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Ryan Po , Gordon Wetzstein

SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control

Synthesizing natural human motion that adapts to complex environments while allowing creative control remains a fundamental challenge in motion synthesis. Existing models often fall short, either by assuming flat terrain or lacking the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-23 Xiaohan Zhang , Sebastian Starke , Vladimir Guzov , Zhensong Zhang , Eduardo Pérez Pellitero , Gerard Pons-Moll

Conditional Text Image Generation with Diffusion Models

Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Yuanzhi Zhu , Zhaohai Li , Tianwei Wang , Mengchao He , Cong Yao

Animate Your Motion: Turning Still Images into Dynamic Videos

In recent years, diffusion models have made remarkable strides in text-to-video generation, sparking a quest for enhanced control over video outputs to more accurately reflect user intentions. Traditional efforts predominantly focus on…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Mingxiao Li , Bo Wan , Marie-Francine Moens , Tinne Tuytelaars

Contextualized Diffusion Models for Text-Guided Image and Video Generation

Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion models primarily focus on incorporating text-visual…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ling Yang , Zhilong Zhang , Zhaochen Yu , Jingwei Liu , Minkai Xu , Stefano Ermon , Bin Cui

Mixed Diffusion for 3D Indoor Scene Synthesis

Generating realistic 3D scenes is an area of growing interest in computer vision and robotics. However, creating high-quality, diverse synthetic 3D content often requires expert intervention, making it costly and complex. Recently, efforts…

Computer Vision and Pattern Recognition · Computer Science 2024-12-11 Siyi Hu , Diego Martin Arroyo , Stephanie Debats , Fabian Manhardt , Luca Carlone , Federico Tombari

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than…

Computer Vision and Pattern Recognition · Computer Science 2022-12-06 Xihui Liu , Dong Huk Park , Samaneh Azadi , Gong Zhang , Arman Chopikyan , Yuxiao Hu , Humphrey Shi , Anna Rohrbach , Trevor Darrell

In-Context Learning Unlocked for Diffusion Models

We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our…

Computer Vision and Pattern Recognition · Computer Science 2023-10-20 Zhendong Wang , Yifan Jiang , Yadong Lu , Yelong Shen , Pengcheng He , Weizhu Chen , Zhangyang Wang , Mingyuan Zhou

Layout2Scene: 3D Semantic Layout Guided Scene Generation via Geometry and Appearance Diffusion Priors

3D scene generation conditioned on text prompts has significantly progressed due to the development of 2D diffusion generation models. However, the textual description of 3D scenes is inherently inaccurate and lacks fine-grained control…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Minglin Chen , Longguang Wang , Sheng Ao , Ye Zhang , Kai Xu , Yulan Guo

Scene Graph Conditioning in Latent Diffusion

Diffusion models excel in image generation but lack detailed semantic control using text prompts. Additional techniques have been developed to address this limitation. However, conditioning diffusion models solely on text-based descriptions…

Computer Vision and Pattern Recognition · Computer Science 2023-10-17 Frank Fundel