聚类算法研究综述

陈新泉; 周灵晶; 刘耀中

doi:10.12146/j.issn.2095-3135.201703004

聚类算法研究综述

Review on Clustering Algorithms

摘要

摘要: 聚类是数据挖掘研究领域的一种重要数据预处理方法, 其目的是从无标签数据集中获得有价值数据集的内在分布结构, 进而简化数据集的描述。历经几十年的研究, 针对不同应用和数据特性已出现了千余种不同的聚类算法, 但不同的聚类算法都有其特定的适用范围和不足。传统的聚类算法大致可分为划分聚类方法、层次聚类方法、密度聚类方法、网格聚类方法、模型聚类方法等。通过对传统聚类方法的回顾和总结, 文章重点介绍了近年来出现的同步聚类算法、信念传播聚类算法和密度峰值聚类算法, 并针对以上聚类算法的应用及发展方向进行了论述。

Abstract: Clustering is an important research topic in data mining domain for data preprocessing. Clustering is an unsupervised learning method that tries to find out some obvious clusters in the unlabeled data. It is usually performed by maximizing the similarity of inner-clusters and minimizing the similarity of inter-clusters. A lot of clustering algorithms have been proposed to solve various tasks and data properties in the past decades. However, all existing clustering methods have their own pros and cons, and there still lack of a clustering method with universality. Traditional clustering methods are usually classified into partitioning methods, hierarchical methods, density-based methods, grid-based methods and model-based methods. With a brief review to classical clustering methods, we put emphasis on introducing some recent emerging clustering methods like synchronization clustering algorithm, affinity propagation algorithm and density peaks algorithm. Based on the analysis and comparison of these algorithms, their potential applications and research directions are also discussed.

HTML全文

参考文献(0)

施引文献

资源附件(0)