高级检索

基于词项关联的短文本分类研究

The Research of Short Texts Classification Based on Association Rules of Lexical Items

  • 摘要: 以短文本为主体的微博等社交媒体, 因具备文本短、特征稀疏等特性, 使得传统文本分类方法不能够高精度地对短文本进行分类。针对这一问题, 文章提出了基于词项关联的短文本分类方法。首先对训练集进行强关联规则挖掘, 将强关联规则加入到短文本的特征中, 提高短文本特征密度, 进而提高短文本分类精度。对比实验表明, 该方法一定程度上减缓了短文本特征稀疏特点对分类结果的影响, 提高了分类准确率、召回率和 F1 值。

     

    Abstract: Due to its characteristics of shortness and sparseness, short text, as the main body of microblog and other social media, cannot be accurately classified by the traditional text classification methods. To solve this problem, a method of short text classification based on association rules of lexical items was proposed in this paper. Firstly, the training set based on the strong association rules was mined, and then the strong association rules was added to the features of short text so as to increase the feature density of short text, thereby to increase the accuracy of results of short text classification. Comparative experiments show that this method, to some extent, reduces the impact of sparseness of short text on the classification results, and it improves the classification accuracy, recall values and F1 values.

     

/

返回文章
返回