基于领域上下文辅助的开放域行为识别

doi:10.12146/j.issn.2095-3135.20231226001

首页 > 过刊浏览>2024年第13卷第6期 >31-43. DOI:10.12146/j.issn.2095-3135.20231226001

基于领域上下文辅助的开放域行为识别
DOI:
                        10.12146/j.issn.2095-3135.20231226001
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP183
基金项目:国家重点研发计划(2022ZD0160505)，国家自然科学基金项目(62272450)

Open Domain Action Recognition Based on Domain Context Assistance

Author:

Affiliation:

Fund Project:

This work is supported by National Key Research and Development Program of China (2022ZD0160505), National Natural Science Foundation of China (62272450)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

如何将预训练模型中的知识迁移到视频理解下游任务是计算机视觉研究中的一个关键问题。在开放域场景中，由于不利的数据条件，知识迁移变得更具挑战性。受自然语言处理技术的启示，近期，许多多模态预训练模型通过设计文本提示进行迁移学习。作者利用大语言模型对开放域的理解能力，提出一种基于领域上下文辅助的开放域行为识别方法，提升模型在开放域场景下的理解能力。通过大语言模型对文本标签的上下文信息进行丰富，将视觉表示与人类行为的多层次描述进行对齐，实现鲁棒的分类。在开放域场景下进行了广泛的行为识别实验，在全监督设置中，该文方法在 ARID 数据集上得到了 71.86% 的 Top1 准确率，而在 Tiny-VARIT 数据集上得到了 80.93% 的平均精确率。此外，在无源视频领域自适应设置下，该研究得到了 48.63% 的 Top1 准确率，而在多源视频领域自适应设置中，该研究得到了 54.36% 的 Top1 准确率，实验结果表明了领域上下文辅助在各种开放域环境下的有效性。

Abstract:

Effectively transferring knowledge from pre-trained models to downstream video understanding tasks is an important topic in computer vision research. Knowledge transfer becomes more challenging in open domain due to poor data conditions. Many recent multi-modal pre-training models are inspired by natural language processing and perform transfer learning by designing prompt learning. The paper leverages the comprehension ability of large language models over open domains and proposes a domain-context-assisted method for open-domain behavior recognition. This approach aligns visual representation with multi-level descriptions of human actions for robust classification, by enriching action labels with context knowledge in large language model. In the experiments of open-domain action recognition with fully supervised setting, it obtain a Top1 accuracy of 71.86% on the ARID dataset, and an mean average precision of 80.93% on the Tiny-VARIT dataset. More important, it can achieve Top1 accuracy of 48.63% in source-free video domain adaptation and 54.36% in multi-source video domain adaptation. The experimental results demonstrate the efficacy of domain context-assisted in a variety of open domain environments.

参考文献

相似文献

引证文献

引用本文

引文格式
许清林,乔宇,王亚立.基于领域上下文辅助的开放域行为识别 [J].集成技术,2024,13(6):31-43

Citing format
XU Qinglin, QIAO Yu, WANG Yali. Open Domain Action Recognition Based on Domain Context Assistance[J]. Journal of Integration Technology,2024,13(6):31-43

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2023-12-26
最后修改日期:2023-12-26
录用日期:
在线发布日期: 2024-03-25
出版日期:

首页

期刊简介

编委会

作者中心

审稿中心

读者中心

伦理规范

最新资讯

联系我们

English

引用本文

相关视频

分享

文章指标

历史

文章二维码