Ctrl&Shift: High-Quality Geometry-Aware Object Manipulation in Visual Generation

Penghui Ruan; Bojia Zi; Xianbiao Qi; Youze Huang; Rong Xiao; Pichao Wang; Jiannong Cao; Yuhui Shi

Ctrl&Shift: High-Quality Geometry-Aware Object Manipulation in Visual Generation

Computer Vision and Pattern Recognition 2026-02-13 v1

Authors: Penghui Ruan , Bojia Zi , Xianbiao Qi , Youze Huang , Rong Xiao , Pichao Wang , Jiannong Cao , Yuhui Shi

Abstract

Object-level manipulation, relocating or reorienting objects in images or videos while preserving scene realism, is central to film post-production, AR, and creative editing. Yet existing methods struggle to jointly achieve three core goals: background preservation, geometric consistency under viewpoint shifts, and user-controllable transformations. Geometry-based approaches offer precise control but require explicit 3D reconstruction and generalize poorly; diffusion-based methods generalize better but lack fine-grained geometric control. We present Ctrl&Shift, an end-to-end diffusion framework to achieve geometry-consistent object manipulation without explicit 3D representations. Our key insight is to decompose manipulation into two stages, object removal and reference-guided inpainting under explicit camera pose control, and encode both within a unified diffusion process. To enable precise, disentangled control, we design a multi-task, multi-stage training strategy that separates background, identity, and pose signals across tasks. To improve generalization, we introduce a scalable real-world dataset construction pipeline that generates paired image and video samples with estimated relative camera poses. Extensive experiments demonstrate that Ctrl&Shift achieves state-of-the-art results in fidelity, viewpoint consistency, and controllability. To our knowledge, this is the first framework to unify fine-grained geometric control and real-world generalization for object manipulation, without relying on any explicit 3D modeling.

Keywords

text-to-3d generation image editing diffusion model

Cite

@article{arxiv.2602.11440,
  title  = {Ctrl&Shift: High-Quality Geometry-Aware Object Manipulation in Visual Generation},
  author = {Penghui Ruan and Bojia Zi and Xianbiao Qi and Youze Huang and Rong Xiao and Pichao Wang and Jiannong Cao and Yuhui Shi},
  journal= {arXiv preprint arXiv:2602.11440},
  year   = {2026}
}

Comments

Accepted at ICLR 2026

Ctrl&Shift: High-Quality Geometry-Aware Object Manipulation in Visual Generation

Abstract

Keywords

Cite

Comments

Related papers