Abstract:In this work, a new paradigm of visual language modeling is introduced in ophthalmic image disease recognition. And a multi-disease recognition algorithm based on a pre-trained model of contrasting language images is proposed. First, a new multi-labeled fundus image dataset MDFCD8 containing 8 categories is constructed based on several publicly available fundus image datasets. Then, the generative artificial intelligence GPT-4 is utilized to generate expert knowledge describing the fine-grained pathological features of fundus images, which solves the problem of the lack of text labels in fundus image datasets. The experimental results showed that, the proposed method outperforms the traditional convolutional neural network and Transformer network by 4.8% and 3.2%, respectively. This study also conducted ablation experiments on each module to validate the effectiveness of the method, and also demonstrated the potential of visual language modeling in ophthalmic disease research.