Information extraction is an important area of data mining. Text information extraction means extracting specified information from a section of free text and storing structured data in the knowledge base for user querying or further processing. Character attribute information extraction is an important instrument of building search engine of persons, and is also a technology for computer program understanding. This paper presents an automatic method to obtain encyclopedia character attributes, and this method uses the speech tagging of each attribute value to locate the encyclopedia free text. The rules are discovered by statistical method, and the character attributes information is obtained from encyclopedia text according to rules matching. Experiments show that this method is effective in extracting character attribute information from encyclopedia text. The extracted results can be used to build the knowledge base of the character attributes.
Citing format Li Hongliang, Yang Yan, Yin Hongfeng, et al. Rules-Based Character Attributes Extraction from Baidu Encyclopedia[J]. Journal of Integration Technology,2013,2(3):1-4