Related papers: Ctrl&Shift: High-Quality Geometry-Aware Object Man…

GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation

Precise geometric control in image generation is essential for engineering \& product design and creative industries to control 3D object features accurately in image space. Traditional 3D editing approaches are time-consuming and demand…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Phillip Mueller , Talip Uenlue , Sebastian Schmidt , Marcel Kollovieh , Jiajie Fan , Stephan Guennemann , Lars Mikelsons

GeoDiffuser: Geometry-Based Image Editing with Diffusion Models

The success of image generative models has enabled us to build methods that can edit images based on text or other user input. However, these methods are bespoke, imprecise, require additional information, or are limited to only 2D image…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Rahul Sajnani , Jeroen Vanbaar , Jie Min , Kapil Katyal , Srinath Sridhar

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

Although diffusion-based models can generate high-quality and high-resolution video sequences from textual or image inputs, they lack explicit integration of geometric cues when controlling scene lighting and visual appearance across…

Computer Vision and Pattern Recognition · Computer Science 2025-06-04 Yuanze Lin , Yi-Wen Chen , Yi-Hsuan Tsai , Ronald Clark , Ming-Hsuan Yang

Follow My Hold: Hand-Object Interaction Reconstruction through Geometric Guidance

We propose a novel diffusion-based framework for reconstructing 3D geometry of hand-held objects from monocular RGB images by leveraging hand-object interaction as geometric guidance. Our method conditions a latent diffusion model on an…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Ayce Idil Aytekin , Helge Rhodin , Rishabh Dabral , Christian Theobalt

3D Object Manipulation in a Single Image using Generative Models

Object manipulation in images aims to not only edit the object's presentation but also gift objects with motion. Previous methods encountered challenges in concurrently handling static editing and dynamic generation, while also struggling…

Computer Vision and Pattern Recognition · Computer Science 2025-01-23 Ruisi Zhao , Zechuan Zhang , Zongxin Yang , Yi Yang

Collage Diffusion

We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene. Collage Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2023-09-01 Vishnu Sarukkai , Linden Li , Arden Ma , Christopher Ré , Kayvon Fatahalian

RoboTransfer: Controllable Geometry-Consistent Video Diffusion for Manipulation Policy Transfer

The goal of general-purpose robotics is to create agents that can seamlessly adapt to and operate in diverse, unstructured human environments. Imitation learning has become a key paradigm for robotic manipulation, yet collecting large-scale…

Computer Vision and Pattern Recognition · Computer Science 2026-01-07 Liu Liu , Xiaofeng Wang , Guosheng Zhao , Keyu Li , Wenkang Qin , Jiagang Zhu , Jiaxiong Qiu , Zheng Zhu , Guan Huang , Zhizhong Su

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

Text-guided image editing has recently experienced rapid development. However, simultaneously performing multiple editing actions on a single image, such as background replacement and specific subject attribute changes, while maintaining…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Pengzhi Li , QInxuan Huang , Yikang Ding , Zhiheng Li

ReorientDiff: Diffusion Model based Reorientation for Object Manipulation

The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications. While certain goals can be achieved by picking and placing the objects of interest directly,…

Robotics · Computer Science 2023-09-18 Utkarsh A. Mishra , Yongxin Chen

Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion

Controllable video generation has attracted significant attention, largely due to advances in video diffusion models. In domains such as autonomous driving, it is essential to develop highly accurate predictions for object motions. This…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Ge Ya Luo , Zhi Hao Luo , Anthony Gosselin , Alexia Jolicoeur-Martineau , Christopher Pal

Object-Centric Diffusion for Efficient Video Editing

Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Kumara Kahatapitiya , Adil Karjauv , Davide Abati , Fatih Porikli , Yuki M. Asano , Amirhossein Habibian

PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor

Generative image editing has recently witnessed extremely fast-paced growth. Some works use high-level conditioning such as text, while others use low-level conditioning. Nevertheless, most of them lack fine-grained control over the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-10 Vidit Goel , Elia Peruzzo , Yifan Jiang , Dejia Xu , Xingqian Xu , Nicu Sebe , Trevor Darrell , Zhangyang Wang , Humphrey Shi

TDEdit: A Unified Diffusion Framework for Text-Drag Guided Image Manipulation

This paper explores image editing under the joint control of text and drag interactions. While recent advances in text-driven and drag-driven editing have achieved remarkable progress, they suffer from complementary limitations: text-driven…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Qihang Wang , Yaxiong Wang , Lechao Cheng , Zhun Zhong

Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors

3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild. Accurately reconstructing an object's complete 3D structure and texture has…

Computer Vision and Pattern Recognition · Computer Science 2024-11-21 Hritam Basak , Hadi Tabatabaee , Shreekant Gayaka , Ming-Feng Li , Xin Yang , Cheng-Hao Kuo , Arnie Sen , Min Sun , Zhaozheng Yin

Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors

We propose a novel image editing technique that enables 3D manipulations on single images, such as object rotation and translation. Existing 3D-aware image editing approaches typically rely on synthetic multi-view datasets for training…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Ruicheng Wang , Jianfeng Xiang , Jiaolong Yang , Xin Tong

FreeInsert: Personalized Object Insertion with Geometric and Style Control

Text-to-image diffusion models have made significant progress in image generation, allowing for effortless customized generation. However, existing image editing methods still face certain limitations when dealing with personalized image…

Computer Vision and Pattern Recognition · Computer Science 2025-09-26 Yuhong Zhang , Han Wang , Yiwen Wang , Rong Xie , Li Song

Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers

Recent advances in diffusion models have significantly improved image editing. However, challenges persist in handling geometric transformations, such as translation, rotation, and scaling, particularly in complex scenes. Existing…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Shuo Zhang , Wenzhuo Wu , Huayu Zhang , Jiarong Cheng , Xianghao Zang , Chao Ban , Hao Sun , Zhongjiang He , Tianwei Cao , Kongming Liang , Zhanyu Ma

POCI-Diff: Position Objects Consistently and Interactively with 3D-Layout Guided Diffusion

We propose a diffusion-based approach for Text-to-Image (T2I) generation with consistent and interactive 3D layout control and editing. While prior methods improve spatial adherence using 2D cues or iterative copy-warp-paste strategies,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Andrea Rigo , Luca Stornaiuolo , Weijie Wang , Mauro Martino , Bruno Lepri , Nicu Sebe

DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation

This paper presents DualCamCtrl, a novel end-to-end diffusion model for camera-controlled video generation. Recent works have advanced this field by representing camera poses as ray-based conditions, yet they often lack sufficient scene…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Hongfei Zhang , Kanghao Chen , Zixin Zhang , Harold Haodong Chen , Yuanhuiyi Lyu , Yuqi Zhang , Shuai Yang , Kun Zhou , Yingcong Chen

A3D: Does Diffusion Dream about 3D Alignment?

We tackle the problem of text-driven 3D generation from a geometry alignment perspective. Given a set of text prompts, we aim to generate a collection of objects with semantically corresponding parts aligned across them. Recent methods…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Savva Ignatyev , Nina Konovalova , Daniil Selikhanovych , Oleg Voynov , Nikolay Patakin , Ilya Olkov , Dmitry Senushkin , Alexey Artemov , Anton Konushin , Alexander Filippov , Peter Wonka , Evgeny Burnaev