Related papers: Jointly Generating Multi-view Consistent PBR Textu…

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, particularly when users simultaneously specify text prompts, subject references,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Yusuf Dalva , Guocheng Gordon Qian , Maya Goldenberg , Tsai-Shien Chen , Kfir Aberman , Sergey Tulyakov , Pinar Yanardag , Kuan-Chieh Jackson Wang

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Omer Bar-Tal , Lior Yariv , Yaron Lipman , Tali Dekel

Coupled Diffusion Sampling for Training-Free Multi-View Image Editing

We present an inference-time diffusion sampling method to perform multi-view consistent image editing using pre-trained 2D image editing models. These models can independently produce high-quality edits for each image in a set of multi-view…

Computer Vision and Pattern Recognition · Computer Science 2025-10-17 Hadi Alzayer , Yunzhi Zhang , Chen Geng , Jia-Bin Huang , Jiajun Wu

Curved Diffusion: A Generative Model With Optical Geometry Control

State-of-the-art diffusion models can generate highly realistic images based on various conditioning like text, segmentation, and depth. However, an essential aspect often overlooked is the specific camera geometry used during image…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Andrey Voynov , Amir Hertz , Moab Arar , Shlomi Fruchter , Daniel Cohen-Or

Controllable-Continuous Color Editing in Diffusion Model via Color Mapping

In recent years, text-driven image editing has made significant progress. However, due to the inherent ambiguity and discreteness of natural language, color editing still faces challenges such as insufficient precision and difficulty in…

Computer Vision and Pattern Recognition · Computer Science 2025-09-18 Yuqi Yang , Dongliang Chang , Yuanchen Fang , Yi-Zhe SonG , Zhanyu Ma , Jun Guo

Table-to-Text Generation with Pretrained Diffusion Models

Diffusion models have demonstrated significant potential in achieving state-of-the-art performance across various text generation tasks. In this systematic study, we investigate their application to the table-to-text problem by adapting the…

Computation and Language · Computer Science 2024-09-24 Aleksei S. Krylov , Oleg D. Somov

Collaborative Control for Geometry-Conditioned PBR Image Generation

Graphics pipelines require physically-based rendering (PBR) materials, yet current 3D content generation approaches are built on RGB models. We propose to model the PBR image distribution directly, avoiding photometric inaccuracies in RGB…

Computer Vision and Pattern Recognition · Computer Science 2024-08-26 Shimon Vainer , Mark Boss , Mathias Parger , Konstantin Kutsy , Dante De Nigris , Ciara Rowles , Nicolas Perony , Simon Donné

TexPainter: Generative Mesh Texturing with Multi-view Consistency

The recent success of pre-trained diffusion models unlocks the possibility of the automatic generation of textures for arbitrary 3D meshes in the wild. However, these models are trained in the screen space, while converting them to a…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Hongkun Zhang , Zherong Pan , Congyi Zhang , Lifeng Zhu , Xifeng Gao

Compass Control: Multi Object Orientation Control for Text-to-Image Generation

Existing approaches for controlling text-to-image diffusion models, while powerful, do not allow for explicit 3D object-centric control, such as precise control of object orientation. In this work, we address the problem of multi-object…

Computer Vision and Pattern Recognition · Computer Science 2025-04-11 Rishubh Parihar , Vaibhav Agrawal , Sachidanand VS , R. Venkatesh Babu

Visual Bridge: Universal Visual Perception Representations Generating

Recent advances in diffusion models have achieved remarkable success in isolated computer vision tasks such as text-to-image generation, depth estimation, and optical flow. However, these models are often restricted by a…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Yilin Gao , Shuguang Dou , Junzhou Li , Zhiheng Yu , Yin Li , Dongsheng Jiang , Shugong Xu

FlexPainter: Flexible and Multi-View Consistent Texture Generation

Texture map production is an important part of 3D modeling and determines the rendering quality. Recently, diffusion-based methods have opened a new way for texture generation. However, restricted control flexibility and limited prompt…

Graphics · Computer Science 2025-06-04 Dongyu Yan , Leyi Wu , Jiantao Lin , Luozhou Wang , Tianshuo Xu , Zhifei Chen , Zhen Yang , Lie Xu , Shunsi Zhang , Yingcong Chen

When Do Diffusion Models learn to Generate Multiple Objects?

Text-to-image diffusion models achieve impressive visual fidelity, yet they remain unreliable in multi-object generation. Despite extensive empirical evidence of these failures, the underlying causes remain unclear. We begin by asking how…

Computer Vision and Pattern Recognition · Computer Science 2026-05-04 Yujin Jeong , Arnas Uselis , Iro Laina , Seong Joon Oh , Anna Rohrbach

Text-Guided Texturing by Synchronized Multi-View Diffusion

This paper introduces a novel approach to synthesize texture to dress up a given 3D object, given a text prompt. Based on the pretrained text-to-image (T2I) diffusion model, existing methods usually employ a project-and-inpaint approach, in…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Yuxin Liu , Minshan Xie , Hanyuan Liu , Tien-Tsin Wong

Controllable Coupled Image Generation via Diffusion Models

We provide an attention-level control method for the task of coupled image generation, where "coupled" means that multiple simultaneously generated images are expected to have the same or very similar backgrounds. While backgrounds coupled,…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Chenfei Yuan , Nanshan Jia , Hangqi Li , Peter W. Glynn , Zeyu Zheng

How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control

Score-based generative modeling, informally referred to as diffusion models, continue to grow in popularity across several important domains and tasks. While they provide high-quality and diverse samples from empirical distributions,…

Machine Learning · Statistics 2023-12-29 Jacopo Teneggi , Matthew Tivnan , J. Webster Stayman , Jeremias Sulam

ControlCom: Controllable Image Composition using Diffusion Model

Image composition targets at synthesizing a realistic composite image from a pair of foreground and background images. Recently, generative composition methods are built on large pretrained diffusion models to generate composite images,…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Bo Zhang , Yuxuan Duan , Jun Lan , Yan Hong , Huijia Zhu , Weiqiang Wang , Li Niu

Text2Street: Controllable Text-to-image Generation for Street Views

Text-to-image generation has made remarkable progress with the emergence of diffusion models. However, it is still a difficult task to generate images for street views based on text, mainly because the road topology of street scenes is…

Computer Vision and Pattern Recognition · Computer Science 2024-02-08 Jinming Su , Songen Gu , Yiting Duan , Xingyue Chen , Junfeng Luo

An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes

A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues…

Computer Vision and Pattern Recognition · Computer Science 2024-08-05 Zhengyi Zhao , Chen Song , Xiaodong Gu , Yuan Dong , Qi Zuo , Weihao Yuan , Liefeng Bo , Zilong Dong , Qixing Huang

Collage Diffusion

We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene. Collage Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2023-09-01 Vishnu Sarukkai , Linden Li , Arden Ma , Christopher Ré , Kayvon Fatahalian

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

Recently, diffusion models have achieved great success in image synthesis. However, when it comes to the layout-to-image generation where an image often has a complex scene of multiple objects, how to make strong control over both the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-13 Guangcong Zheng , Xianpan Zhou , Xuewei Li , Zhongang Qi , Ying Shan , Xi Li