An Entity Resolution Approach Based on Random Forest
Author:
Affiliation:

Funding:

Ethical statement:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
    Abstract:

    Entity Resolution assigns data objects corresponding to the same real world entity described in one or more data sources into the same group, which plays an important role in data cleaning, data integration, and data mining. However, the features of the entity may evolve over time irregularly, which makes the entity resolution significantly challenging. Traditional approaches can only tackle the issue that the feature of an entity changes regularly with time but can not deal with the case that the feature changes irregularly over time. An approach based on classification was proposed to solve this problem. Firstly, the random forest, a machine learning algorithm, was used to calculate the similarity of records. Consequently, new two-stage clustering algorithm was employed to perform the record clustering. Finally, the evaluation on real data sets shows that the approach can effectively improve the resolution accuracy of the evolutionary entity.

    Reference
    Related
    Cited by
Get Citation

YANG Meng, NIE Tiezheng, SHEN Derong, et al. An Entity Resolution Approach Based on Random Forest[J]. Journal of Integration Technology,2018,7(2):57-68

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: March 20,2018
  • Published: