DOI: 10.3390/app16126242 ISSN: 2076-3417

Unveiling the Landscape of Human Pose Estimation

Jianjun Yang, Sankarshan Dasgupta, Wenjiao Liu, Ju Shen, Bryson R. Payne, Ying Luo, Ruixu Liu, Tam V. Nguyen

Human pose estimation (HPE) has advanced rapidly with deep learning, enabling a transition from specialized sensing and multi-view systems toward monocular RGB-based approaches. These developments have expanded applications in healthcare, robotics, sports analytics, and human–computer interaction. However, the growing diversity of deep learning paradigms, ranging from convolutional and recurrent models to graph-based and Transformer-based approaches, has resulted in a fragmented literature, making it difficult to systematically compare methods and guide system design. This paper addresses this challenge by providing a comprehensive survey of deep learning-based monocular HPE methods published over the past decade and introducing a unified modular framework. The proposed framework organizes HPE systems into six modular estimation paradigms, including single-image-based estimation, multi-frame-based estimation, Top-Down and Bottom-Up pose estimation strategies, 2D-to-3D pose reconstruction, and direct 3D estimation. Each module is analyzed in terms of representative approaches, design trade-offs, and practical considerations, supported by algorithmic formulations that outline the computational pipeline at each stage. Unlike prior surveys that primarily catalog methods or report benchmark results in isolation, this work emphasizes how component-level design choices relate to overall system performance. The paper summarizes performance trends on benchmarks including Human3.6M, COCO, and MPII, highlighting persistent challenges such as occlusion and viewpoint variation, and outlines future research directions including interaction-aware modeling, efficient deployment, and improved robustness under real-world conditions.

More from our Archive