Related papers: SPAD : Spatially Aware Multiview Diffusers

SPREAD: Spatial-Physical REasoning via geometry Aware Diffusion

Automated 3D scene generation is pivotal for applications spanning virtual reality, digital content creation, and Embodied AI. While computer graphics prioritizes aesthetic layouts, vision and robotics demand scenes that mirror real-world…

Graphics · Computer Science 2026-03-31 Minzhang Li , Kuixiang Shao , Xuebing Li , Yuyang Jiao , Yinuo Bai , Hengan Zhou , Sixian Shen , Jiayuan Gu , Jingyi Yu

Pose-Aware Diffusion for 3D Generation

Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end…

Computer Vision and Pattern Recognition · Computer Science 2026-05-04 Zihan Zhou , Luxi Chen , Jingzhi Zhou , Yuhao Wan , Min Zhao , Baoyu Fan , Chongxuan Li

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

In this paper, we present a novel diffusion model called that generates multiview-consistent images from a single-view image. Using pretrained large-scale 2D diffusion models, recent work Zero123 demonstrates the ability to generate…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Yuan Liu , Cheng Lin , Zijiao Zeng , Xiaoxiao Long , Lingjie Liu , Taku Komura , Wenping Wang

Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation

Text-to-image synthesis has achieved high-quality results with recent advances in diffusion models. However, text input alone has high spatial ambiguity and limited user controllability. Most existing methods allow spatial control through…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Yuki Endo

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

Diffusion-based methods have achieved remarkable achievements in 2D image or 3D object generation, however, the generation of 3D scenes and even $360^{\circ}$ images remains constrained, due to the limited number of scene datasets, the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-01 Weicai Ye , Chenhao Ji , Zheng Chen , Junyao Gao , Xiaoshui Huang , Song-Hai Zhang , Wanli Ouyang , Tong He , Cairong Zhao , Guofeng Zhang

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

Open-world 3D generation has recently attracted considerable attention. While many single-image-to-3D methods have yielded visually appealing outcomes, they often lack sufficient controllability and tend to produce hallucinated regions that…

Computer Vision and Pattern Recognition · Computer Science 2024-08-20 Chao Xu , Ang Li , Linghao Chen , Yulin Liu , Ruoxi Shi , Hao Su , Minghua Liu

SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations

Novel view synthesis (NVS) boosts immersive experiences in computer vision and graphics. Existing techniques, though progressed, rely on dense multi-view observations, restricting their application. This work takes on the challenge of…

Computer Vision and Pattern Recognition · Computer Science 2025-07-14 Songchun Zhang , Huiyao Xu , Sitong Guo , Zhongwei Xie , Hujun Bao , Weiwei Xu , Changqing Zou

Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior

Recent works on text-to-3d generation show that using only 2D diffusion supervision for 3D generation tends to produce results with inconsistent appearances (e.g., faces on the back view) and inaccurate shapes (e.g., animals with extra…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Cheng Chen , Xiaofeng Yang , Fan Yang , Chengzeng Feng , Zhoujie Fu , Chuan-Sheng Foo , Guosheng Lin , Fayao Liu

DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views

Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the…

Computer Vision and Pattern Recognition · Computer Science 2023-06-19 Paul Yoo , Jiaxian Guo , Yutaka Matsuo , Shixiang Shane Gu

ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs

Dynamic Novel View Synthesis aims to generate photorealistic views of moving subjects from arbitrary viewpoints. This task is particularly challenging when relying on monocular video, where disentangling structure from motion is ill-posed…

Computer Vision and Pattern Recognition · Computer Science 2025-06-24 Michal Nazarczuk , Sibi Catley-Chandar , Thomas Tanay , Zhensong Zhang , Gregory Slabaugh , Eduardo Pérez-Pellitero

SPARC: Shared Perspective with Avatar Distortion for Remote Collaboration in VR

Telepresence VR systems allow for face-to-face communication, promoting the feeling of presence and understanding of nonverbal cues. However, when discussing virtual 3D objects, limitations to presence and communication cause deictic…

Human-Computer Interaction · Computer Science 2025-04-08 João Simões , Anderson Maciel , Catarina Moreira , Joaquim Jorge

Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation

Diffusion models have recently gained recognition for generating diverse and high-quality content, especially in image synthesis. These models excel not only in creating fixed-size images but also in producing panoramic images. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Xiaoyu Zhang , Teng Zhou , Xinlong Zhang , Jia Wei , Yongchuan Tang

Diffusion-Based Attention Warping for Consistent 3D Scene Editing

We present a novel method for 3D scene editing using diffusion models, designed to ensure view consistency and realism across perspectives. Our approach leverages attention features extracted from a single reference image to define the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-12 Eyal Gomel , Lior Wolf

SpaRC: Sparse Radar-Camera Fusion for 3D Object Detection

In this work, we present SpaRC, a novel Sparse fusion transformer for 3D perception that integrates multi-view image semantics with Radar and Camera point features. The fusion of radar and camera modalities has emerged as an efficient…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Philipp Wolters , Johannes Gilg , Torben Teepe , Fabian Herzog , Felix Fent , Gerhard Rigoll

WAVE: Warp-Based View Guidance for Consistent Novel View Synthesis Using a Single Image

Generating high-quality novel views of a scene from a single image requires maintaining structural coherence across different views, referred to as view consistency. While diffusion models have driven advancements in novel view synthesis,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Jiwoo Park , Tae Eun Choi , Youngjun Jun , Seong Jae Hwang

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e.g., perspective crops from a panorama or multi-view images given depth…

Computer Vision and Pattern Recognition · Computer Science 2023-12-27 Shitao Tang , Fuyang Zhang , Jiacheng Chen , Peng Wang , Yasutaka Furukawa

SpaRC-AD: A Baseline for Radar-Camera Fusion in End-to-End Autonomous Driving

End-to-end autonomous driving systems promise stronger performance through unified optimization of perception, motion forecasting, and planning. However, vision-based approaches face fundamental limitations in adverse weather conditions,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Philipp Wolters , Johannes Gilg , Torben Teepe , Gerhard Rigoll

SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either…

Machine Learning · Computer Science 2020-03-17 Zhixuan Lin , Yi-Fu Wu , Skand Vishwanath Peri , Weihao Sun , Gautam Singh , Fei Deng , Jindong Jiang , Sungjin Ahn

Look Beyond: Two-Stage Scene View Generation via Panorama and Video Diffusion

Novel view synthesis (NVS) from a single image is highly ill-posed due to large unobserved regions, especially for views that deviate significantly from the input. While existing methods focus on consistency between the source and generated…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Xueyang Kang , Zhengkang Xiang , Zezheng Zhang , Kourosh Khoshelham

IntelliCap: Intelligent Guidance for Consistent View Sampling

Novel view synthesis from images, for example, with 3D Gaussian splatting, has made great progress. Rendering fidelity and speed are now ready even for demanding virtual reality applications. However, the problem of assisting humans in…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Ayaka Yasunaga , Hideo Saito , Dieter Schmalstieg , Shohei Mori