一种自发性口语评测文本语义相似度评分特征提取方法

Method of Text-to-text Semantic Similarity Feature Extraction for Spontaneous Speech Evaluation

摘要: 自发性口语评测中如何提取文本语义相似度评分特征是一个非常困难的问题。针对这个问题本文采用基于词网络（WordNet）的Lesk算法计算词与词之间的语义相似度, 在词义相似度的基础上提出了词与文本之间的语义相似度算法, 提出了一种完整的基于词网络的文本语义相似度评分特征提取方法。实验利用该方法在考生答案与标准答案之间中提取文本语义相似度评分特征, 并利用该特征与老师评分进行相关度分析, 实验结果表明该算法可以有效的表征考生答案和标准答案之间的文本语义相似度。

Abstract: Due to the difficult of text-to-text semantic similarity feature extraction in spontaneous speech evaluation, this paper presents WordNet based Lesk algorithm to calculate the semantic similarity between words, defines the semantic similarity algorithm between word and text based on the semantic similarity between words, and proposes a complete set of wordnet based text-to-text semantic similarity feature extraction methods. Experiment extracts text-to-text semantic similarity feature between student’s answers and the standard answers with this algorithm and analyzes the correlation between the feature and the teacher rating. Experimental results show that the algorithm can effectively characterize the text-to-text semantic similarity between the students’ answers and the standard answer.