English

Improved Single Camera BEV Perception Using Multi-Camera Training

Computer Vision and Pattern Recognition 2026-02-20 v1

Abstract

Bird's Eye View (BEV) map prediction is essential for downstream autonomous driving tasks like trajectory prediction. In the past, this was accomplished through the use of a sophisticated sensor configuration that captured a surround view from multiple cameras. However, in large-scale production, cost efficiency is an optimization goal, so that using fewer cameras becomes more relevant. But the consequence of fewer input images correlates with a performance drop. This raises the problem of developing a BEV perception model that provides a sufficient performance on a low-cost sensor setup. Although, primarily relevant for inference time on production cars, this cost restriction is less problematic on a test vehicle during training. Therefore, the objective of our approach is to reduce the aforementioned performance drop as much as possible using a modern multi-camera surround view model reduced for single-camera inference. The approach includes three features, a modern masking technique, a cyclic Learning Rate (LR) schedule, and a feature reconstruction loss for supervising the transition from six-camera inputs to one-camera input during training. Our method outperforms versions trained strictly with one camera or strictly with six-camera surround view for single-camera inference resulting in reduced hallucination and better quality of the BEV map.

Keywords

Cite

@article{arxiv.2409.02676,
  title  = {Improved Single Camera BEV Perception Using Multi-Camera Training},
  author = {Daniel Busch and Ido Freeman and Richard Meyes and Tobias Meisen},
  journal= {arXiv preprint arXiv:2409.02676},
  year   = {2026}
}

Comments

This Paper has been accepted to the 27th IEEE International Conference on Intelligent Transportation Systems (ITSC 2024)

R2 v1 2026-06-28T18:33:57.871Z