基于聚焦注意力机制的对齐回归手部姿态估计网络
作者:
作者单位:

1.中国科学院深圳先进技术研究院,中国科学院大学;2.中国科学院深圳先进技术研究院;3.南方科技大学,深圳先进技术研究院

中图分类号:

TP 399

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目),深圳市基础研究重点项目,深圳市医学研究专项


Alignment Regression Hand Pose Estimation Network Based on Focused Attention Mechanism
Author:
Affiliation:

1.Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences, University of Chinese Academy of Sciences;2.Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences;3.Southern University of Science and Technology,Shenzhen Institutes of Advanced Technology

Fund Project:

National Natural Science Foundation of China,Shenzhen Key Program on Basic Research,Shenzhen Medical Research Special Program

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    基于 RGB 图像的手部姿态估计在动态手势识别以及人机交互领域展现出至关重要的应用前景。然而,现有的方法面临着诸多挑战,例如手部自相似性程度高、关键点分布极为密集等问题,这使得在较低计算成本的条件下实现高精度的预测变得困难重重,进而导致在复杂场景中的表现存在局限性。鉴于此,本文提出了一种基于 YOLOv8 网络的二维(2D)手部姿态估计模型 —— FAR-HandNet。该模型巧妙地融合了聚焦线性注意力模块、关键点对齐策略以及回归残差拟合模块,有效地增强了对小目标区域(如手部)的特征捕捉能力,同时减少了自相似性对手部关键点定位精度的不良影响。值得一提的是,回归残差拟合模块借助流生成模型对关键点残差分布进行拟合,极大地提升了回归模型的精度。本文的实验在 CMU 和 FreiHAND 数据集上展开。实验结果清晰地表明,FAR-HandNet 在参数量和计算效率方面优势明显,在不同阈值下的 PCK(Percentage of Correct Keypoints)表现优异,相较于现有方法有显著提升。此外,该模型的推理时间仅需 32ms。消融实验进一步证实了各模块的有效性,充分验证了 FAR-HandNet 在手部姿态估计任务中的有效性和优越性。

    Abstract:

    Hand pose estimation based on RGB images shows crucial application prospects in the fields of dynamic gesture recognition and human-computer interaction. However, existing methods face numerous challenges. For example, the high degree of self-similarity of the hand and the extremely dense distribution of key points make it extremely difficult to achieve high-precision prediction under the condition of low computational cost, which in turn leads to limitations in performance in complex scenarios.In view of this, this paper proposes a two-dimensional (2D) hand pose estimation model based on the YOLOv8 network, namely FAR-HandNet. This model ingeniously integrates the Focused Linear Attention module, the key point alignment strategy, and the regression residual fitting module, effectively enhancing the feature capture ability for small target areas (such as the hand), while reducing the adverse impact of self-similarity on the positioning accuracy of hand key points. It is worth mentioning that the regression residual fitting module uses a flow-based generative model to fit the distribution of key point residuals, which greatly improves the accuracy of the regression model.The experiments in this paper are carried out on the CMU and FreiHAND datasets. The experimental results clearly show that FAR-HandNet has obvious advantages in terms of the number of parameters and computational efficiency, and performs excellently in PCK (Percentage of Correct Keypoints) under different thresholds, showing a significant improvement compared with existing methods. In addition, the inference time of this model is only 32ms. The ablation experiments further confirm the effectiveness of each module, fully verifying the effectiveness and superiority of FAR-HandNet in the hand pose estimation task.

    参考文献
    相似文献
    引证文献
引用本文

窦铭扬,耿艳娟,杨佳彬.基于聚焦注意力机制的对齐回归手部姿态估计网络 [J].集成技术,

Citing format
Dou Ming Yang, Geng Yan Juan, Yang Jiabin. Alignment Regression Hand Pose Estimation Network Based on Focused Attention Mechanism[J]. Journal of Integration Technology.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-10-30
  • 最后修改日期:2025-03-11
  • 录用日期:2025-03-12
  • 在线发布日期: 2025-03-13
文章二维码