基于MobileNetv4改进的YOLOv8目标检测算法研究

张宁; 周云翀; 徐坤财; 左超超; 彭如镜

doi:10.12146/j.issn.2095-3135.20250320001

基于MobileNetv4改进的YOLOv8目标检测算法研究

Research on an Improved YOLOv8 Object Detection Algorithm Based on MobileNetv4

摘要

摘要: 为移动设备设计的轻量级卷积神经网络具有较快的推理速度，但受到自身网络局部性约束，即仅能在一个窗口区域内捕获局部信息，导致性能下降。引入自注意力机制虽然可以捕获全局信息，但会降低检测速度。针对上述问题，本文基于YOLOv8提出一种对硬件友好的MobileNetv4网络架构。该结构通过引入通用倒置瓶颈搜索块，融合了倒置瓶颈、ConvNext、Feed Forward网络及一种新型的额外深度卷积变体。同时，该结构还引入了动态上采样算子，改进了上采样操作，降低了模型使用GPU的内存和延迟。此外，本文改进了YOLOv8的检测头，通过引入动态检测头，将空间感知、尺度感知和任务感知融合到一个框架中，并在目标检测头中有效地应用注意力机制，提高检测性能和效率。实验结果表明，与次优模型YOLOv8n相比，YOLOv8n_M的平均精度均值mAP50～95提升了1.3%；在模型复杂度方面，YOLOv8n_M成功压缩了36%的模型规模(参数量缩减100万)，同时将计算量降低26%(十亿次浮点运算GFLOPs减少2.4单位)。本文提出的YOLOv8_M有效减少了模型的参数量和推理时间，并在一定程度上提高了模型在不同环境下的目标检测精度。

Abstract: The lightweight convolutional neural network designed for mobile devices features fast inference speed but is constrained by its inherent locality. Local information can only be captured within a windowed region, leading to performance degradation. Introducing the self-attention mechanism can capture global information, but it reduces detection speed. To address these issues, this paper introduces a hardware-friendly MobileNetv4 network architecture based on YOLOv8, incorporating a universally inverted bottleneck search block that integrates the inverted bottleneck, ConvNext, Feed Forward network, and a novel variant of extra depthwise convolution. Additionally, a dynamic upsampling operator is introduced to improve the upsampling operation, reducing GPU memory usage and latency in the model. Furthermore, this paper enhances the detection head of YOLOv8 by introducing a dynamic detection head, which combines spatial awareness, scale awareness, and task awareness into a unified framework. It effectively applies the attention mechanism in the object detection head, improving detection performance and efficiency. The experimental results demonstrate that compared to the next-best model, YOLOv8n, YOLOv8n_M achieved an improvement of 1.3% in mean average precision (mAP 0.5∶0.95). In terms of model complexity, YOLOv8n_M successfully compresses the model size by 36% (with a reduction of 1 million parameters) and reduces computational costs by 26% (The Giga floating-point operations (GFLOPs) were reduced by 2.4.). The proposed YOLOv8_M effectively reduces the model's parameter count and inference time while improving object detection accuracy in various environments to a certain extent.

HTML全文

参考文献(19)

施引文献

资源附件(0)