高级检索

基于MobileNetv4改进的YOLOv8目标检测算法研究

Research on an Improved YOLOv8 Object Detection Algorithm Based on MobileNetv4

  • 摘要: 为移动设备设计的轻量级卷积神经网络具有较快的推理速度,但受到自身网络局部性约束,即仅能在一个窗口区域内捕获局部信息,导致性能下降。引入自注意力机制虽然可以捕获全局信息,但会降低检测速度。针对上述问题,本文基于YOLOv8提出一种对硬件友好的MobileNetv4网络架构。该结构通过引入通用倒置瓶颈搜索块,融合了倒置瓶颈、ConvNext、Feed Forward网络及一种新型的额外深度卷积变体。同时,该结构还引入了动态上采样算子,改进了上采样操作,降低了模型使用GPU的内存和延迟。此外,本文改进了YOLOv8的检测头,通过引入动态检测头,将空间感知、尺度感知和任务感知融合到一个框架中,并在目标检测头中有效地应用注意力机制,提高检测性能和效率。实验结果表明,与次优模型YOLOv8n相比,YOLOv8n_M的平均精度均值mAP50~95提升了1.3%;在模型复杂度方面,YOLOv8n_M成功压缩了36%的模型规模(参数量缩减100万),同时将计算量降低26%(十亿次浮点运算GFLOPs减少2.4单位)。本文提出的YOLOv8_M有效地较少了模型的参数量和推理时间,并在一定程度上提高了模型在不同环境下的目标检测精度。

     

    Abstract: The lightweight convolutional neural network designed for mobile devices features fast inference speed but is constrained by its inherent locality. Local information can only be captured within a windowed region, leading to performance degradation. Introducing the self-attention mechanism can capture global information, but it reduces detection speed. To address these issues, this paper introduces a hardware-friendly MobileNetv4 network architecture based on YOLOv8, incorporating a universally inverted bottleneck search block that integrates the inverted bottleneck, ConvNext, Feed Forward network, and a novel variant of extra depthwise convolution. Additionally, a dynamic upsampling operator is introduced to improve the upsampling operation, reducing GPU memory usage and latency in the model. Furthermore, this paper enhances the detection head of YOLOv8 by introducing a dynamic detection head, which combines spatial awareness, scale awareness, and task awareness into a unified framework. It effectively applies the attention mechanism in the object detection head, improving detection performance and efficiency. The experimental results demonstrate that compared to the next-best model, YOLOv8n, YOLOv8n_M achieved an improvement of 1.3% in mean Average Precision (mAP 0.5∶0.95). In terms of model complexity, YOLOv8n_M successfully compresses the model size by 36% (with a reduction of 1 million parameters) and reduces computational costs by 26% (The Giga Floating-Point Operations (GFLOPs) were reduced by 2.4.). The proposed YOLOv8_M effectively reduces the model's parameter count and inference time while improving object detection accuracy in various environments to a certain extent.

     

/

返回文章
返回