基于上下文信息和大语言模型的开放词汇室内三维目标检测
CSTR:
作者:
作者单位:

中国科学院深圳先进技术研究院

作者简介:

通讯作者:

中图分类号:

TP 183

基金项目:


Contextual Information and Large Language Model for Open-Vocabulary Indoor 3D Object Detection
Author:
Affiliation:

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    三维目标检测:现有室内三维目标检测算法能够检测的目标类别往往是有限的,这限制其在智能机器人领域的应用。开放词汇目标检测能够在不用定义目标类别的前提下检测给定场景的所有感兴趣目标,从而解决室内三维目标检测的不足。与此同时,大语言模型的先验知识能够显著提升视觉任务的性能。然而现有的开放词汇室内三维目标检测研究存在只关注目标信息,而忽视了上下文信息的问题。室内三维目标检测输入数据主要是点云,点云数据存在稀疏和噪声问题。只依赖目标信息,会对三维目标检测结果产生负面影响。上下文信息包含场景描述,能够对目标信息进行补充,从而提升目标检测中类别判定的准确率。为此,本文提出了基于上下文信息和大语言模型的开放词汇室内三维目标检测算法,该算法通过结合上下文信息和大语言模型的思维链推理来获取检测结果。最后在SUN RGB-D和ScanNetV2数据集上对所提出的算法进行了验证,实验结果验证了所提出算法的有效性。

    Abstract:

    Existing indoor three-dimensional (3D) object detection is able to detect a limited number of object categories, thus limiting the application on intelligent robotics. Open vocabulary object detection is able to detect all objects of interest in a given scene without defining object categories, thus solving the shortcomings of indoor 3D object detection. At the same time, the large language model with prior knowledge can significantly improve the performance of visual tasks. However, existing researches on open-vocabulary indoor 3D object detection only focuses on object information and ignores contextual information. The input data for indoor 3D object detection is mainly point cloud, which suffers from sparsity and noise problems. Relying only on the object point cloud can negatively affect the 3D detection results. Contextual information contains scene information, which can complement the object information to promote the recognition on object category. For this reason, this paper proposes an open vocabulary 3D object detection algorithm based on contextual information assistance. The algorithm integrates contextual information and object information through a large language model, and then performs chain-of-thought reasoning. The proposed algorithm is validated on SUN RGB-D and ScanNetV2 datasets, and the experimental results show the effectiveness of the proposed algorithm.

    参考文献
    相似文献
    引证文献
引用本文

张胜,程俊.基于上下文信息和大语言模型的开放词汇室内三维目标检测 [J].集成技术,

Citing format
Zhang Sheng, Cheng Jun. Contextual Information and Large Language Model for Open-Vocabulary Indoor 3D Object Detection[J]. Journal of Integration Technology.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-12-01
  • 最后修改日期:2025-01-11
  • 录用日期:2025-01-15
  • 在线发布日期: 2025-02-13
  • 出版日期:
文章二维码