基于领域上下文辅助的开放域行为识别
CSTR:
作者:
作者单位:

1.中国科学院深圳先进技术研究院;2.上海人工智能实验室

作者简介:

通讯作者:

中图分类号:

TP183

基金项目:

科技创新 2030——“新一代人工智能”重大项目(2022ZD0160505),国家自然科学基金资助项目(62272450)


Domain Context-Assisted for Open-World Action Recognition
Author:
Affiliation:

1.Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen;2.Shanghai AI Laboratory;3.Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences

Fund Project:

National Key R&D Program of China(NO.2022ZD0160505), the National Natural Science Foundation of China(Grant No. 62272450)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    如何将预训练模型所获得的知识有效地迁移到视频理解下游任务,是计算机视觉研究中的一个关键问题。在开放域场景中,由于不利的数据条件,知识迁移变得更具挑战性。受到自然语言处理技术的启示,近期许多多模态预训练模型通过设计文本提示进行迁移学习。本文提出了一种基于领域上下文辅助的开放域视频理解方法,通过大语言模型来深化模型对开放世界的理解。通过在大语言模型中融入领域的先验知识,在大语言模型中丰富动作标签的上下文知识,将视觉表示与人类行为的多层次描述进行对齐,实现了鲁棒的分类效果。我们在开放世界场景下进行了广泛的行为识别实验,在全监督设置中,ARID数据集的预测准确度达到71.86%,而Tiny-VARIT数据集在均值平均精度上取得了80.93%。在无源领域自适应设置下,预测准确度实现了48.63%;而多源领域自适应设置中,准确率为54.36%,实验结果显示了领域上下文辅助在各种自适应环境下的有效性。

    Abstract:

    Effectively transferring knowledge from pre-trained models to downstream video understanding tasks is an important topic in computer vision research. Knowledge transfer becomes more challenging in open world due to poor data conditions. Many recent multimodal pre-training models are inspired by natural language processing and perform transfer learning by designing prompt learning. In this paper, we propose an LLM-powered domain context-assisted open-world action recognition method that leverages the open-world understanding capabilities of large language models. Our approach aligns visual representation with multi-level descriptions of human actions for robust classification, by enriching action labels with contextual knowledge in large language model. In the experiments of open-world action recognition with fully supervised setting, we obtain a Top-1 accuracy of 71.86% on the ARID dataset, and an mAP of 80.93% on the Tiny-VARIT dataset. More important, our method can achieve Top-1 accuracy of 48.63% in source-free video domain adaptation and 54.36% in multi-source video domain adaptation.

    参考文献
    相似文献
    引证文献
引用本文

许清林,乔宇,王亚立.基于领域上下文辅助的开放域行为识别 [J].集成技术,

Citing format
QinglinXu, Yu Qiao, Yali Wang. Domain Context-Assisted for Open-World Action Recognition[J]. Journal of Integration Technology.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-12-26
  • 最后修改日期:2023-12-26
  • 录用日期:
  • 在线发布日期: 2024-03-25
  • 出版日期:
文章二维码