Abstract:To enhance the lightweight human pose estimation network′s ability to extract information and fuse features from different stages of feature maps, as well as improve the post-processing capability of keypoint heatmaps and classification feature maps, a human pose estimation network based on multi-stage and multi-level feature fusion is proposed. First, a multi-level feature fusion module is designed to improve the neural network model′s ability to extract and summarize information from feature maps. Next, a feature fusion branch is designed in conjunction with the feature fusion module to ensure that information from different stages of the model is preserved without being lost due to long convolution operations. Finally, post-processing operations are applied to the model′s output keypoint classification maps, utilizing a classification loss enhancement module for further enhancement, allowing the model to better focus on the keypoint classification task and improve the accuracy of its outputs. Performance testing is conducted on the CrowdPose dataset, where the AP values of the proposed algorithm and the LitePose algorithm under the XS structure are 50.7% and 48.4%, respectively; under the S structure, the AP values are 59.1% and 58.3%. Performance testing is conducted on the MS COCO val2017 dataset, where the AP values of the proposed algorithm and the LitePose algorithm under the XS structure are 41.9% and 40.6%, respectively; under the S structure, the AP values are 57.0% and 56.8%. Experimental results indicate that the multi-scale feature fusion module, high-resolution fusion branch, and post-processing operations proposed in this paper positively contribute to improving the detection performance of the human pose estimation network.