Summary: | 碩士 === 國立臺灣科技大學 === 資訊工程系 === 95 === Abstract
We usually rely on effective search engines to maximize the usage of rich web information. However, in many situations, the query input is in limited size and even the powerful search engine can not catch the actual goal from the users. Statistics report that there is merely one keyword submitted to the search engines by 77% users approximately, and queries submitted by 85% users contain less than three keywords. We call such queries as incomplete queries. Our goal is to separate the main concept corresponding to a query into sub-concept where one of them may be related to the real interest of the user. The documents usually lie on a very high dimensional space where the existence of a keyword means a dimension to be considered. Therefore, we propose a dimension reduction method based on a manifold learning approach for the clustering process. We adopt Isomap for dimension reduction. Experimentally, after the Isomap process, the dimension reduced dataset gives us better presentation of the dataset. Also, due to the data size is reduced, the execution time of the whole mining process can be close to real time. Several variants of Isomap will also be studied in this thesis.
|