Alignment Regression Hand Pose Estimation Network Based on Focused Attention Mechanism
CSTR:
Author:
Affiliation:

1.Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences, University of Chinese Academy of Sciences;2.Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences;3.Southern University of Science and Technology,Shenzhen Institutes of Advanced Technology

Clc Number:

TP 399

Fund Project:

National Natural Science Foundation of China,Shenzhen Key Program on Basic Research,Shenzhen Medical Research Special Program

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Hand pose estimation based on RGB images shows crucial application prospects in the fields of dynamic gesture recognition and human-computer interaction. However, existing methods face numerous challenges. For example, the high degree of self-similarity of the hand and the extremely dense distribution of key points make it extremely difficult to achieve high-precision prediction under the condition of low computational cost, which in turn leads to limitations in performance in complex scenarios.In view of this, this paper proposes a two-dimensional (2D) hand pose estimation model based on the YOLOv8 network, namely FAR-HandNet. This model ingeniously integrates the Focused Linear Attention module, the key point alignment strategy, and the regression residual fitting module, effectively enhancing the feature capture ability for small target areas (such as the hand), while reducing the adverse impact of self-similarity on the positioning accuracy of hand key points. It is worth mentioning that the regression residual fitting module uses a flow-based generative model to fit the distribution of key point residuals, which greatly improves the accuracy of the regression model.The experiments in this paper are carried out on the CMU and FreiHAND datasets. The experimental results clearly show that FAR-HandNet has obvious advantages in terms of the number of parameters and computational efficiency, and performs excellently in PCK (Percentage of Correct Keypoints) under different thresholds, showing a significant improvement compared with existing methods. In addition, the inference time of this model is only 32ms. The ablation experiments further confirm the effectiveness of each module, fully verifying the effectiveness and superiority of FAR-HandNet in the hand pose estimation task.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 30,2024
  • Revised:March 11,2025
  • Adopted:March 12,2025
  • Online: March 13,2025
  • Published:
Article QR Code