Abstract:In this paper, a sports news search engine, Geeking, was introduced, which contains four functional models: web crawling, champion list building, search processing and user interface. Geeking could provide query correction, query auto-completion, search results sorting, news clustering, keywords highlighting and snapshot visualization. Given a query, the system automatically completes the query according to the search logs and the news hot keywords. If there was no return of result, the system could correct the query and provided the recommended query terms. The related documents were searched quickly according to the champion list. Based on the tf-idf values and other factors like news headlines and release time, the documents’ relevance was calculated. For the clustering of similar news, the longest common subsequence and levenshtein distance were used to measure the similarity between news headlines and the similarity of news headlines could be regarded as the similarity between documents. Test results were given to show that Geeking is fast and stable.