Domain Context-Assisted for Open-World Action Recognition
CSTR:
Author:
Affiliation:

1.Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen;2.Shanghai AI Laboratory;3.Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences

Clc Number:

TP183

Fund Project:

National Key R&D Program of China(NO.2022ZD0160505), the National Natural Science Foundation of China(Grant No. 62272450)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Effectively transferring knowledge from pre-trained models to downstream video understanding tasks is an important topic in computer vision research. Knowledge transfer becomes more challenging in open world due to poor data conditions. Many recent multimodal pre-training models are inspired by natural language processing and perform transfer learning by designing prompt learning. In this paper, we propose an LLM-powered domain context-assisted open-world action recognition method that leverages the open-world understanding capabilities of large language models. Our approach aligns visual representation with multi-level descriptions of human actions for robust classification, by enriching action labels with contextual knowledge in large language model. In the experiments of open-world action recognition with fully supervised setting, we obtain a Top-1 accuracy of 71.86% on the ARID dataset, and an mAP of 80.93% on the Tiny-VARIT dataset. More important, our method can achieve Top-1 accuracy of 48.63% in source-free video domain adaptation and 54.36% in multi-source video domain adaptation.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 26,2023
  • Revised:December 26,2023
  • Adopted:
  • Online: March 25,2024
  • Published:
Article QR Code