基于聚焦注意力机制的对齐回归手部姿态估计网络
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP399

基金项目:

国家自然科学基金项目 (62373345);深圳市医学研究专项项目(D2402013);深圳市基础研究重点项目(JCYJ20220818101602005)


Alignment Regression Hand Pose Estimation Network Based on Focused Attention Mechanism
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    基于 RGB 图像的手部姿态估计在动态手势识别和人机交互领域有着广泛的应用前景。然而,现有方法面临诸多挑战,如手部自相似性程度高和关键点分布极为密集等问题,导致难以以较低的计算成本实现高精度的预测,进而导致在复杂场景中的表现存在局限性。鉴于此,本文提出一种基于 YOLOv8 网络的二维手部姿态估计模型——FAR-HandNet。该模型巧妙地融合了聚焦线性注意力模块、关键点对齐策略和回归残差拟合模块,有效地增强了对小目标区域(如手部)的特征捕捉能力,同时减少了自相似性对手部关键点定位精度的不良影响。此外,回归残差拟合模块基于流生成模型对关键点残差分布进行拟合,极大地提升了回归模型的精度。本文实验在卡内基梅隆大学全景数据集 (CMU)和 FreiHAND 数据集上展开。实验结果表明,FAR-HandNet 在参数量和计算效率方面优势明显,与现有方法相比,在不同阈值下的正确关键点概率表现优异。同时,该模型的推理时间仅需 32 ms。消融实验进一步证实了各模块的有效性,充分验证了 FAR-HandNet 在手部姿态估计任务中的有效性和优越性。

    Abstract:

    Hand pose estimation based on RGB images holds wide application prospects in dynamic gesture recognition and human-computer interaction. However, existing methods face challenges such as high hand self-similarity and densely distributed keypoints, making it difficult to achieve high-precision predictions with low computational costs, thereby limiting their performance in complex scenarios. To address these challenges, this paper proposes a 2D hand pose estimation model named FAR-HandNet, based on the YOLOv8 network. The model ingeniously integrates a focused linear attention module, a keypoint alignment strategy, and a regression residual fitting module, effectively enhancing feature capture capabilities for small target regions (e.g., hands) while mitigating the adverse effects of self-similarity on the localization accuracy of hand keypoints. Additionally, the regression residual fitting module leverages a flow-based generative model to fit the residual distribution of keypoints, significantly improving regression precision. Experiments were conducted on the Carnegie Mellon University panorama dataset (CMU) and the FreiHAND dataset. Results demonstrate that FAR-HandNet exhibits remarkable advantages in parameter size and computational efficiency. Compared to existing methods, it achieves superior performance in the percentage of correct keypoints under varying thresholds. Furthermore, the model achieves an inference time of only 32 ms. Ablation studies further validate the effectiveness of each module, conclusively verifying the efficacy and superiority of FAR-HandNet in hand pose estimation tasks.

    参考文献
    相似文献
    引证文献
引用本文

引文格式
窦铭扬,耿艳娟,杨佳彬.基于聚焦注意力机制的对齐回归手部姿态估计网络 [J].集成技术,2025,14(3):64-77

Citing format
DOU Mingyang, GENG Yanjuan, YANG Jiabin. Alignment Regression Hand Pose Estimation Network Based on Focused Attention Mechanism[J]. Journal of Integration Technology,2025,14(3):64-77

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-10-30
  • 最后修改日期:2025-03-11
  • 录用日期:
  • 在线发布日期: 2025-05-09
  • 出版日期:
文章二维码