Domain Context-Assisted for Open-World Action Recognition
Author:
Affiliation:

1.Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen;2.Shanghai AI Laboratory;3.Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences

Funding:

National Key R&D Program of China(NO.2022ZD0160505), the National Natural Science Foundation of China(Grant No. 62272450)

Ethical statement:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
    Abstract:

    Effectively transferring knowledge from pre-trained models to downstream video understanding tasks is an important topic in computer vision research. Knowledge transfer becomes more challenging in open world due to poor data conditions. Many recent multimodal pre-training models are inspired by natural language processing and perform transfer learning by designing prompt learning. In this paper, we propose an LLM-powered domain context-assisted open-world action recognition method that leverages the open-world understanding capabilities of large language models. Our approach aligns visual representation with multi-level descriptions of human actions for robust classification, by enriching action labels with contextual knowledge in large language model. In the experiments of open-world action recognition with fully supervised setting, we obtain a Top-1 accuracy of 71.86% on the ARID dataset, and an mAP of 80.93% on the Tiny-VARIT dataset. More important, our method can achieve Top-1 accuracy of 48.63% in source-free video domain adaptation and 54.36% in multi-source video domain adaptation.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
History
  • Received:December 26,2023
  • Revised:December 26,2023
  • Adopted:
  • Online: March 25,2024
  • Published: