Abstract:Metaphor has the purpose of inspiring understanding and persuading others. At present, metaphor presents the trend of multimodal integration of text, image, and video. Therefore, identifying the metaphorical semantics contained in multimodal contents is of research value for Internet content security.. Due to the lack of multimodal metaphor data sets, it is difficult for scholars to build research models and pay more attention to text-based metaphor detection. To overcome this shortcoming, we first generate a new multimodal metaphor dataset MDEI from the perspectives of image-text, metaphor appearance, emotion expression, and author intention. Then, Kappa scores were used to assess the consistency among the annotators of the dataset. Finally, a multimodal metaphor detection model is constructed to verify the quality and value of the multimodal data set by combining image attribute features, image entity features, and text features with the help of a pre-training model and attention mechanism. The experimental results show that the MDEI can improve the effectiveness of metaphor model detection, and confirm that the interrelationship of multimodal information is helpful for understanding metaphor.