Deep Visual Odometry with Events and Frames
Abstract
Visual Odometry (VO) is crucial for autonomous robotic navigation, especially in GPS-denied environments like planetary terrains. To improve robustness, recent model-based VO systems have begun combining standard and event-based cameras. While event cameras excel in low-light and high-speed motion, standard cameras provide dense and easier-to-track features. However, the field of image- and event-based VO still predominantly relies on model-based methods and is yet to fully integrate recent image-only advancements leveraging end-to-end learning-based architectures. Seamlessly integrating the two modalities remains challenging due to their different nature, one asynchronous, the other not, limiting the potential for a more effective image- and event-based VO. We introduce RAMP-VO, the first end-to-end learned image- and event-based VO system. It leverages novel Recurrent, Asynchronous, and Massively Parallel (RAMP) encoders capable of fusing asynchronous events with image data, providing 8x faster inference and 33% more accurate predictions than existing solutions. Despite being trained only in simulation, RAMP-VO outperforms previous methods on the newly introduced Apollo and Malapert datasets, and on existing benchmarks, where it improves image- and event-based methods by 58.8% and 30.6%, paving the way for robust and asynchronous VO in space.
Cite
@article{arxiv.2309.09947,
title = {Deep Visual Odometry with Events and Frames},
author = {Roberto Pellerito and Marco Cannici and Daniel Gehrig and Joris Belhadj and Olivier Dubois-Matra and Massimo Casasco and Davide Scaramuzza},
journal= {arXiv preprint arXiv:2309.09947},
year = {2024}
}
Comments
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024