Summary: | 碩士 === 元智大學 === 資訊管理學系 === 99 === This study aims to approximate Google ranking results using semantically related terms of query. Firstly, we crawled and extracted web page title, snippet and URL from Google search results. Then we found semantically related terms using Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) two approaches. Secondly we calculated the scores for keywords in title, keyword in snippet and keyword in URL for obtaining a document score. Several experiments were conducted on different combination of number of semantically related terms, number of documents, uni-gram and n-gram tokenization method, 1 topic and 2 topics of semantically related terms. The experimental results showed the average R-Precision reaches 0.8, indicating the ranking results of the proposed method approximates to Google results.
|