English

Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry

Computer Vision and Pattern Recognition 2025-09-30 v2 Robotics

Abstract

Monocular visual odometry is a key technology in various autonomous systems. Traditional feature-based methods suffer from failures due to poor lighting, insufficient texture, and large motions. In contrast, recent learning-based dense SLAM methods exploit iterative dense bundle adjustment to address such failure cases, and achieve robust and accurate localization in a wide variety of real environments, without depending on domain-specific supervision. However, despite its potential, the methods still struggle with scenarios involving large motion and object dynamics. In this study, we diagnose key weaknesses in a popular learning-based dense SLAM model (DROID-SLAM) by analyzing major failure cases on outdoor benchmarks and exposing various shortcomings of its optimization process. We then propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimator to initialize the dense bundle adjustment process, leading to robust visual odometry without the need to fine-tune the SLAM backbone. Despite its simplicity, the proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark.

Keywords

Cite

@article{arxiv.2406.00929,
  title  = {Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry},
  author = {Takayuki Kanai and Igor Vasiljevic and Vitor Guizilini and Kazuhiro Shintani},
  journal= {arXiv preprint arXiv:2406.00929},
  year   = {2025}
}

Comments

Project page: https://toyotafrc.github.io/SGInit-Proj/

R2 v1 2026-06-28T16:50:27.895Z