English

Robust Consistent Video Depth Estimation

Computer Vision and Pattern Recognition 2021-06-23 v2

Abstract

We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video. We integrate a learning-based depth prior, in the form of a convolutional neural network trained for single-image depth estimation, with geometric optimization, to estimate a smooth camera trajectory as well as detailed and stable depth reconstruction. Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details. In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations. Our method quantitatively outperforms state-of-the-arts on the Sintel benchmark for both depth and pose estimations and attains favorable qualitative results across diverse wild datasets.

Keywords

Cite

@article{arxiv.2012.05901,
  title  = {Robust Consistent Video Depth Estimation},
  author = {Johannes Kopf and Xuejian Rong and Jia-Bin Huang},
  journal= {arXiv preprint arXiv:2012.05901},
  year   = {2021}
}

Comments

Project website: https://robust-cvd.github.io/