English
Related papers

Related papers: Ctrl&Shift: High-Quality Geometry-Aware Object Man…

200 papers

Precise geometric control in image generation is essential for engineering \& product design and creative industries to control 3D object features accurately in image space. Traditional 3D editing approaches are time-consuming and demand…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Phillip Mueller , Talip Uenlue , Sebastian Schmidt , Marcel Kollovieh , Jiajie Fan , Stephan Guennemann , Lars Mikelsons

The success of image generative models has enabled us to build methods that can edit images based on text or other user input. However, these methods are bespoke, imprecise, require additional information, or are limited to only 2D image…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Rahul Sajnani , Jeroen Vanbaar , Jie Min , Kapil Katyal , Srinath Sridhar

Although diffusion-based models can generate high-quality and high-resolution video sequences from textual or image inputs, they lack explicit integration of geometric cues when controlling scene lighting and visual appearance across…

Computer Vision and Pattern Recognition · Computer Science 2025-06-04 Yuanze Lin , Yi-Wen Chen , Yi-Hsuan Tsai , Ronald Clark , Ming-Hsuan Yang

We propose a novel diffusion-based framework for reconstructing 3D geometry of hand-held objects from monocular RGB images by leveraging hand-object interaction as geometric guidance. Our method conditions a latent diffusion model on an…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Ayce Idil Aytekin , Helge Rhodin , Rishabh Dabral , Christian Theobalt

Object manipulation in images aims to not only edit the object's presentation but also gift objects with motion. Previous methods encountered challenges in concurrently handling static editing and dynamic generation, while also struggling…

Computer Vision and Pattern Recognition · Computer Science 2025-01-23 Ruisi Zhao , Zechuan Zhang , Zongxin Yang , Yi Yang

We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene. Collage Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2023-09-01 Vishnu Sarukkai , Linden Li , Arden Ma , Christopher Ré , Kayvon Fatahalian

The goal of general-purpose robotics is to create agents that can seamlessly adapt to and operate in diverse, unstructured human environments. Imitation learning has become a key paradigm for robotic manipulation, yet collecting large-scale…

Computer Vision and Pattern Recognition · Computer Science 2026-01-07 Liu Liu , Xiaofeng Wang , Guosheng Zhao , Keyu Li , Wenkang Qin , Jiagang Zhu , Jiaxiong Qiu , Zheng Zhu , Guan Huang , Zhizhong Su

Text-guided image editing has recently experienced rapid development. However, simultaneously performing multiple editing actions on a single image, such as background replacement and specific subject attribute changes, while maintaining…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Pengzhi Li , QInxuan Huang , Yikang Ding , Zhiheng Li

The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications. While certain goals can be achieved by picking and placing the objects of interest directly,…

Robotics · Computer Science 2023-09-18 Utkarsh A. Mishra , Yongxin Chen

Controllable video generation has attracted significant attention, largely due to advances in video diffusion models. In domains such as autonomous driving, it is essential to develop highly accurate predictions for object motions. This…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Ge Ya Luo , Zhi Hao Luo , Anthony Gosselin , Alexia Jolicoeur-Martineau , Christopher Pal

Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy…

Computer Vision and Pattern Recognition · Computer Science 2024-09-02 Kumara Kahatapitiya , Adil Karjauv , Davide Abati , Fatih Porikli , Yuki M. Asano , Amirhossein Habibian

Generative image editing has recently witnessed extremely fast-paced growth. Some works use high-level conditioning such as text, while others use low-level conditioning. Nevertheless, most of them lack fine-grained control over the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-10 Vidit Goel , Elia Peruzzo , Yifan Jiang , Dejia Xu , Xingqian Xu , Nicu Sebe , Trevor Darrell , Zhangyang Wang , Humphrey Shi

This paper explores image editing under the joint control of text and drag interactions. While recent advances in text-driven and drag-driven editing have achieved remarkable progress, they suffer from complementary limitations: text-driven…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Qihang Wang , Yaxiong Wang , Lechao Cheng , Zhun Zhong

3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild. Accurately reconstructing an object's complete 3D structure and texture has…

Computer Vision and Pattern Recognition · Computer Science 2024-11-21 Hritam Basak , Hadi Tabatabaee , Shreekant Gayaka , Ming-Feng Li , Xin Yang , Cheng-Hao Kuo , Arnie Sen , Min Sun , Zhaozheng Yin

We propose a novel image editing technique that enables 3D manipulations on single images, such as object rotation and translation. Existing 3D-aware image editing approaches typically rely on synthetic multi-view datasets for training…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Ruicheng Wang , Jianfeng Xiang , Jiaolong Yang , Xin Tong

Text-to-image diffusion models have made significant progress in image generation, allowing for effortless customized generation. However, existing image editing methods still face certain limitations when dealing with personalized image…

Computer Vision and Pattern Recognition · Computer Science 2025-09-26 Yuhong Zhang , Han Wang , Yiwen Wang , Rong Xie , Li Song

Recent advances in diffusion models have significantly improved image editing. However, challenges persist in handling geometric transformations, such as translation, rotation, and scaling, particularly in complex scenes. Existing…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Shuo Zhang , Wenzhuo Wu , Huayu Zhang , Jiarong Cheng , Xianghao Zang , Chao Ban , Hao Sun , Zhongjiang He , Tianwei Cao , Kongming Liang , Zhanyu Ma

We propose a diffusion-based approach for Text-to-Image (T2I) generation with consistent and interactive 3D layout control and editing. While prior methods improve spatial adherence using 2D cues or iterative copy-warp-paste strategies,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Andrea Rigo , Luca Stornaiuolo , Weijie Wang , Mauro Martino , Bruno Lepri , Nicu Sebe

This paper presents DualCamCtrl, a novel end-to-end diffusion model for camera-controlled video generation. Recent works have advanced this field by representing camera poses as ray-based conditions, yet they often lack sufficient scene…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Hongfei Zhang , Kanghao Chen , Zixin Zhang , Harold Haodong Chen , Yuanhuiyi Lyu , Yuqi Zhang , Shuai Yang , Kun Zhou , Yingcong Chen

We tackle the problem of text-driven 3D generation from a geometry alignment perspective. Given a set of text prompts, we aim to generate a collection of objects with semantically corresponding parts aligned across them. Recent methods…

‹ Prev 1 2 3 10 Next ›