English

Multi-Scale Supervised Network for Human Pose Estimation

Computer Vision and Pattern Recognition 2018-08-07 v1

Abstract

Human pose estimation is an important topic in computer vision with many applications including gesture and activity recognition. However, pose estimation from image is challenging due to appearance variations, occlusions, clutter background, and complex activities. To alleviate these problems, we develop a robust pose estimation method based on the recent deep conv-deconv modules with two improvements: (1) multi-scale supervision of body keypoints, and (2) a global regression to improve structural consistency of keypoints. We refine keypoint detection heatmaps using layer-wise multi-scale supervision to better capture local contexts. Pose inference via keypoint association is optimized globally using a regression network at the end. Our method can effectively disambiguate keypoint matches in close proximity including the mismatch of left-right body parts, and better infer occluded parts. Experimental results show that our method achieves competitive performance among state-of-the-art methods on the MPII and FLIC datasets.

Keywords

Cite

@article{arxiv.1808.01623,
  title  = {Multi-Scale Supervised Network for Human Pose Estimation},
  author = {Lipeng Ke and Ming-Ching Chang and Honggang Qi and Siwei Lyu},
  journal= {arXiv preprint arXiv:1808.01623},
  year   = {2018}
}

Comments

Accepted by ICIP2018. arXiv admin note: text overlap with arXiv:1803.09894

R2 v1 2026-06-23T03:24:49.909Z