基于文本增强的眼底图像多病种识别方法
Multi-disease Recognition Method for Fundus Images Based on Text Enhancement
-
摘要: 该研究在眼科图像疾病识别中引入了视觉语言模型, 提出了一种基于对比语言图像预训练模型的多疾病识别算法。首先, 作者基于多个公开可用的眼底图像数据集构建了一个含有 8 个类别的多标签眼底图像数据集 MDFCD8;其次, 作者利用生成式人工智能 GPT-4(Generative Pre-trained Transformer 4) 生成描述眼底图像细粒度病理特征的专家知识, 解决了眼底图像数据集文本标签缺乏的问题;最后, 作者计算了平均精度、F1 评分和受试者工作特征曲线下面积, 并以三者的均值作为最终的性能评价指标。实验结果表明, 与传统的卷积神经网络和 Transformer 网络相比, 作者提出的方法在性能上分别高出 4.8% 和 3.2%。同时, 作者还进行了各模块的消融实验, 验证了该方法的有效性, 表明了视觉语言模型在眼科疾病辅助诊断领域的应用潜力。Abstract: In this work, a visual language model is introduced in ophthalmic image disease recognition. And a multi-disease recognition algorithm based on a pre-trained contrasting language-images model is proposed. First, a multi-labeled fundus image dataset MDFCD8 containing 8 categories is constructed based on several publicly available fundus image datasets. Then, the generative artificial intelligence GPT-4 (Generative Pre-trained Transformer 4) is utilized to generate expert knowledge describing the fine-grained pathological features of fundus images, which solves the problem of the lack of text labels in fundus image datasets. The paper calculates the average precision (AP), F1 score, and area under the receiver operating characteristic curve (AUC), and takes the mean value of the three as the final performance evaluation index. The experimental results showed that, the method proposed in this paper outperforms the traditional convolutional neural network and Transformer network by 4.8% and 3.2%, respectively. This study also conducted ablation experiments on each module to validate the effectiveness of the method, demonstrating the potential application of visual language modeling in the field of auxiliary diagnosis of ophthalmic diseases.