Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

Weijie Wang; Qihang Cao; Sensen Gao; Donny Y. Chen; Haofei Xu; Wenjing Bian; Songyou Peng; Tat-Jen Cham; Chuanxia Zheng; Andreas Geiger; Jianfei Cai; Jia-Wang Bian; Bohan Zhuang

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

Computer Vision and Pattern Recognition 2026-04-16 v1 Artificial Intelligence Graphics

Authors: Weijie Wang , Qihang Cao , Sensen Gao , Donny Y. Chen , Haofei Xu , Wenjing Bian , Songyou Peng , Tat-Jen Cham , Chuanxia Zheng , Andreas Geiger , Jianfei Cai , Jia-Wang Bian , Bohan Zhuang

View on arXiv ↗ PDF ↗

Abstract

Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditional methods achieve high fidelity, they are limited by slow per-scene optimization or category-specific training, which hinders their practical deployment and scalability. Hence, generalizable feed-forward 3D reconstruction has witnessed rapid development in recent years. By learning a model that maps images directly to 3D representations in a single forward pass, these methods enable efficient reconstruction and robust cross-scene generalization. Our survey is motivated by a critical observation: despite the diverse geometric output representations, ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns, such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. Consequently, we abstract away from these representation differences and instead focus on model design, proposing a novel taxonomy centered on model design strategies that are agnostic to the output format. Our proposed taxonomy organizes the research directions into five key problems that drive recent research development: feature enhancement, geometry awareness, model efficiency, augmentation strategies and temporal-aware models. To support this taxonomy with empirical grounding and standardized evaluation, we further comprehensively review related benchmarks and datasets, and extensively discuss and categorize real-world applications based on feed-forward 3D models. Finally, we outline future directions to address open challenges such as scalability, evaluation standards, and world modeling.

Keywords

3d reconstruction scene understanding 3d scene understanding

Cite

@article{arxiv.2604.14025,
  title  = {Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective},
  author = {Weijie Wang and Qihang Cao and Sensen Gao and Donny Y. Chen and Haofei Xu and Wenjing Bian and Songyou Peng and Tat-Jen Cham and Chuanxia Zheng and Andreas Geiger and Jianfei Cai and Jia-Wang Bian and Bohan Zhuang},
  journal= {arXiv preprint arXiv:2604.14025},
  year   = {2026}
}

Comments

67 pages, 395 references. Project page: https://ff3d-survey.github.io. Code: https://github.com/ziplab/Awesome-Feed-Forward-3D. This work has been submitted to Springer for possible publication

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

Abstract

Keywords

Cite

Comments

Related papers