English

Stereo Object Matching Network

Computer Vision and Pattern Recognition 2021-03-25 v1

Abstract

This paper presents a stereo object matching method that exploits both 2D contextual information from images as well as 3D object-level information. Unlike existing stereo matching methods that exclusively focus on the pixel-level correspondence between stereo images within a volumetric space (i.e., cost volume), we exploit this volumetric structure in a different manner. The cost volume explicitly encompasses 3D information along its disparity axis, therefore it is a privileged structure that can encapsulate the 3D contextual information from objects. However, it is not straightforward since the disparity values map the 3D metric space in a non-linear fashion. Thus, we present two novel strategies to handle 3D objectness in the cost volume space: selective sampling (RoISelect) and 2D-3D fusion (fusion-by-occupancy), which allow us to seamlessly incorporate 3D object-level information and achieve accurate depth performance near the object boundary regions. Our depth estimation achieves competitive performance in the KITTI dataset and the Virtual-KITTI 2.0 dataset.

Keywords

Cite

@article{arxiv.2103.12498,
  title  = {Stereo Object Matching Network},
  author = {Jaesung Choe and Kyungdon Joo and Francois Rameau and In So Kweon},
  journal= {arXiv preprint arXiv:2103.12498},
  year   = {2021}
}

Comments

Accepted at ICRA 2021

R2 v1 2026-06-24T00:28:11.993Z