English

DeepCap: Monocular Human Performance Capture Using Weak Supervision

Computer Vision and Pattern Recognition 2020-03-19 v1

Abstract

Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups or did not recover dense space-time coherent geometry with frame-to-frame correspondences. We propose a novel deep learning approach for monocular dense human performance capture. Our method is trained in a weakly supervised manner based on multi-view supervision completely removing the need for training data with 3D ground truth annotations. The network architecture is based on two separate networks that disentangle the task into a pose estimation and a non-rigid surface deformation step. Extensive qualitative and quantitative evaluations show that our approach outperforms the state of the art in terms of quality and robustness.

Keywords

Cite

@article{arxiv.2003.08325,
  title  = {DeepCap: Monocular Human Performance Capture Using Weak Supervision},
  author = {Marc Habermann and Weipeng Xu and Michael Zollhoefer and Gerard Pons-Moll and Christian Theobalt},
  journal= {arXiv preprint arXiv:2003.08325},
  year   = {2020}
}
R2 v1 2026-06-23T14:18:56.403Z