English

Pose-Aware Diffusion for 3D Generation

Computer Vision and Pattern Recognition 2026-05-04 v1

Abstract

Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end diffusion framework that synthesizes 3D geometry directly within the observation space. By unprojecting monocular depth into a partial point cloud and explicitly injecting it as a 3D geometric anchor, PAD abandons canonical assumptions to enforce rigorous spatial supervision. This native generation intrinsically resolves pose ambiguity, producing high-fidelity pose-aligned assets. Extensive experiments demonstrate that PAD achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods. Additionally, PAD naturally extends to compositional 3D scene reconstruction via a simple union of independently generated objects, highlighting its robust ability to preserve precise spatial layouts.

Keywords

Cite

@article{arxiv.2605.00345,
  title  = {Pose-Aware Diffusion for 3D Generation},
  author = {Zihan Zhou and Luxi Chen and Jingzhi Zhou and Yuhao Wan and Min Zhao and Baoyu Fan and Chongxuan Li},
  journal= {arXiv preprint arXiv:2605.00345},
  year   = {2026}
}
R2 v1 2026-07-01T12:44:42.323Z