English

Speed3R: Sparse Feed-forward 3D Reconstruction Models

Computer Vision and Pattern Recognition 2026-03-10 v1 Artificial Intelligence

Abstract

While recent feed-forward 3D reconstruction models accelerate 3D reconstruction by jointly inferring dense geometry and camera poses in a single pass, their reliance on dense attention imposes a quadratic complexity, creating a prohibitive computational bottleneck that severely limits inference speed. To resolve this, we introduce Speed3R, an end-to-end trainable model inspired by the core principle of Structure-from-Motion: that a sparse set of keypoints is sufficient for robust pose estimation. Speed3R features a dual-branch attention mechanism where a compression branch creates a coarse contextual prior to guide a selection branch, which performs fine-grained attention only on the most informative image tokens. This strategy mimics the efficiency of traditional keypoint matching, achieving a remarkable 12.4x inference speedup on 1000-view sequences, while introducing a minimal, controlled trade-off in geometric accuracy. Validated on standard benchmarks with both VGGT and π3\pi^3 backbones, our method delivers high-quality reconstructions at a fraction of computational cost, paving the way for efficient large-scale scene modeling.

Keywords

Cite

@article{arxiv.2603.08055,
  title  = {Speed3R: Sparse Feed-forward 3D Reconstruction Models},
  author = {Weining Ren and Xiao Tan and Kai Han},
  journal= {arXiv preprint arXiv:2603.08055},
  year   = {2026}
}

Comments

CVPR 2026 Findings, project page: https://visual-ai.github.io/speed3r/

R2 v1 2026-07-01T11:09:47.779Z