Web Query Classification based on Latent Topic Analysis

碩士 === 元智大學 === 資訊工程學系 === 99 === Nowadays Web search engines play an important role in helping people effectively find information from massive Web data. The Web query classification (WQC) problem is a crucial issue in search engine technology. The task of WQC is to classify Web queries into releva...

Full description

Bibliographic Details
Main Authors: Bo-Ting Yeh, 葉柏廷
Other Authors: Cheng-Zen Yang
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/06346597981578729118
Description
Summary:碩士 === 元智大學 === 資訊工程學系 === 99 === Nowadays Web search engines play an important role in helping people effectively find information from massive Web data. The Web query classification (WQC) problem is a crucial issue in search engine technology. The task of WQC is to classify Web queries into relevant Web categories. For the WQC problem, there are two major difficulties. First, most queries are short and ambiguous. Second, many queries have more than one user intention. Therefore, this research proposes a scheme that exploits multiple search engines to enrich user queries, and then extracts multiple latent topics from the expanded queries.The scheme uses the Latent Dirichlet Allocation (LDA) model to extract the latent topics from the enriched queries for query classification. The experiments show that our approach can improve the performance by 6.5% and 6.6% for precision and F1, respectively in comparison with the schemes proposed by Shen et al. in 2005. The experimental results show that the proposed LDA-based scheme can effectively improve the WQC performance.