基于文本增强的眼底图像多病种识别方法

doi:10.12146/j.issn.2095-3135.20240422001

首页 > 过刊浏览>2025年第14卷第1期 >78-90. DOI:10.12146/j.issn.2095-3135.20240422001

基于文本增强的眼底图像多病种识别方法
DOI:
                        10.12146/j.issn.2095-3135.20240422001
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP391,R77
基金项目:深圳市技术攻关项目(JSGG20220831105002004)

Multi-disease Recognition Method for Fundus Images Based on Text Enhancement

Author:

Affiliation:

Fund Project:

This work is supported by Shenzhen Science and Technology Innovation Commission (JSGG20220831105002004)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

该研究在眼科图像疾病识别中引入了视觉语言模型，提出了一种基于对比语言图像预训练模型的多疾病识别算法。首先，作者基于多个公开可用的眼底图像数据集构建了一个含有 8 个类别的多标签眼底图像数据集 MDFCD8；其次，作者利用生成式人工智能 GPT-4(Generative Pre-trained Transformer 4) 生成描述眼底图像细粒度病理特征的专家知识，解决了眼底图像数据集文本标签缺乏的问题；最后，作者计算了平均精度、F1 评分和受试者工作特征曲线下面积，并以三者的均值作为最终的性能评价指标。实验结果表明，与传统的卷积神经网络和 Transformer 网络相比，作者提出的方法在性能上分别高出 4.8% 和 3.2%。同时，作者还进行了各模块的消融实验，验证了该方法的有效性，表明了视觉语言模型在眼科疾病辅助诊断领域的应用潜力。

Abstract:

In this work, a visual language model is introduced in ophthalmic image disease recognition. And a multi-disease recognition algorithm based on a pre-trained contrasting language-images model is proposed. First, a multi-labeled fundus image dataset MDFCD8 containing 8 categories is constructed based on several publicly available fundus image datasets. Then, the generative artificial intelligence GPT-4 (Generative Pre-trained Transformer 4) is utilized to generate expert knowledge describing the fine-grained pathological features of fundus images, which solves the problem of the lack of text labels in fundus image datasets. The paper calculates the average precision (AP), F1 score, and area under the receiver operating characteristic curve (AUC), and takes the mean value of the three as the final performance evaluation index. The experimental results showed that, the method proposed in this paper outperforms the traditional convolutional neural network and Transformer network by 4.8% and 3.2%, respectively. This study also conducted ablation experiments on each module to validate the effectiveness of the method, demonstrating the potential application of visual language modeling in the field of auxiliary diagnosis of ophthalmic diseases.

参考文献

相似文献

引证文献

引用本文

引文格式
熊绍奎,陈世峰.基于文本增强的眼底图像多病种识别方法 [J].集成技术,2025,14(1):78-90

Citing format
XIONG Shaokui, CHEN Shifeng. Multi-disease Recognition Method for Fundus Images Based on Text Enhancement[J]. Journal of Integration Technology,2025,14(1):78-90

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-04-22
最后修改日期:2024-04-22
录用日期:
在线发布日期: 2024-06-11
出版日期:

首页

期刊简介

编委会

作者中心

审稿中心

读者中心

伦理规范

最新资讯

联系我们

English

引用本文

相关视频

分享

文章指标

历史

文章二维码