Related papers: Self-Constrained Inference Optimization on Structu…
Recently, human pose estimation mainly focuses on how to design a more effective and better deep network structure as human features extractor, and most designed feature extraction networks only introduce the position of each anatomical…
Human pose estimation is an important topic in computer vision with many applications including gesture and activity recognition. However, pose estimation from image is challenging due to appearance variations, occlusions, clutter…
Human pose estimation (i.e., locating the body parts / joints of a person) is a fundamental problem in human-computer interaction and multimedia applications. Significant progress has been made based on the development of depth sensors,…
Estimating 3D human poses from a monocular video is still a challenging task. Many existing methods' performance drops when the target person is occluded by other objects, or the motion is too fast/slow relative to the scale and speed of…
Multi-person pose estimation in images and videos is an important yet challenging task with many applications. Despite the large improvements in human pose estimation enabled by the development of convolutional neural networks, there still…
We develop a robust multi-scale structure-aware neural network for human pose estimation. This method improves the recent deep conv-deconv hourglass models with four key improvements: (1) multi-scale supervision to strengthen contextual…
Human pose estimation aims at localizing human anatomical keypoints or body parts in the input data (e.g., images, videos, or signals). It forms a crucial component in enabling machines to have an insightful understanding of the behaviors…
In this paper we consider the problem of human pose estimation from a single still image. We propose a novel approach where each location in the image votes for the position of each keypoint using a convolutional neural net. The voting…
We propose a new self-supervised method for predicting 3D human body pose from a single image. The prediction network is trained from a dataset of unlabelled images depicting people in typical poses and a set of unpaired 2D poses. By…
The typical bottom-up human pose estimation framework includes two stages, keypoint detection and grouping. Most existing works focus on developing grouping algorithms, e.g., associative embedding, and pixel-wise keypoint regression that we…
Current methods of multi-person pose estimation typically treat the localization and the association of body joints separately. It is convenient but inefficient, leading to additional computation and a waste of time. This paper, however,…
Single-stage multi-person pose estimation aims to jointly perform human localization and keypoint prediction within a unified framework, offering advantages in inference efficiency and architectural simplicity. Consequently, multi-scale…
Like many computer vision problems, human pose estimation is a challenging problem in that recognizing a body part requires not only information from local area but also from areas with large spatial distance. In order to spatially pass…
Human pose estimation from image and video is a vital task in many multimedia applications. Previous methods achieve great performance but rarely take efficiency into consideration, which makes it difficult to implement the networks on…
Multi-frame human pose estimation in complicated situations is challenging. Although state-of-the-art human joints detectors have demonstrated remarkable results for static images, their performances come short when we apply these models to…
Human pose estimation and tracking are fundamental tasks for understanding human behaviors in videos. Existing top-down framework-based methods usually perform three-stage tasks: human detection, pose estimation and tracking. Although…
A central challenge in human pose estimation, as well as in many other machine learning and prediction tasks, is the generalization problem. The learned network does not have the capability to characterize the prediction error, generate…
For human pose estimation in monocular images, joint occlusions and overlapping upon human bodies often result in deviated pose predictions. Under these circumstances, biologically implausible pose predictions may be produced. In contrast,…
Estimating 3D poses from a monocular video is still a challenging task, despite the significant progress that has been made in recent years. Generally, the performance of existing methods drops when the target person is too small/large, or…
Human pose estimation deeply relies on visual clues and anatomical constraints between parts to locate keypoints. Most existing CNN-based methods do well in visual representation, however, lacking in the ability to explicitly learn the…