This paper presents a framework that combines traditional keypoint-based camera pose optimization with an invertible neural rendering mechanism. Our proposed 3D scene representation, Nerfels, is locally dense yet globally sparse. As opposed to existing invertible neural rendering systems which overfit a model to the entire scene, we adopt a feature-driven approach for representing scene-agnostic, local 3D patches with renderable codes. By modelling a scene only where local features are detected, our framework effectively generalizes to unseen local regions in the scene via an optimizable code conditioning mechanism in the neural renderer, all while maintaining the low memory footprint of a sparse 3D map representation. Our model can be incorporated to existing state-of-the-art hand-crafted and learned local feature pose estimators, yielding improved performance when evaluating on ScanNet for wide camera baseline scenarios.
@article{arxiv.2206.01916,
title = {Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation},
author = {Gil Avraham and Julian Straub and Tianwei Shen and Tsun-Yi Yang and Hugo Germain and Chris Sweeney and Vasileios Balntas and David Novotny and Daniel DeTone and Richard Newcombe},
journal= {arXiv preprint arXiv:2206.01916},
year = {2022}
}