Summary: | 碩士 === 國立中正大學 === 資訊工程所 === 95 === In recent years, there are more and more online databases generate pages dynamically on the World Wide Web. These web databases usually provide a query interface for user search the information they care about. For example, the shopping websites provide query interface that allow users to search products under some kind of category. After users submit queries, the shopping website database will return web pages contain related products’ information. The selection of category can help users to reduce the space of search in a few categories. The benefit of this action is to improve the accuracy of return pages and save search time.
However, different web databases usually have their own category directories. These category directories may be very similar but not identical. In this thesis, we focus on finding the category mapping between different shopping websites. We transform each web page and category in the shopping websites into an n-dimensional vector space. We then use cosine similarity measure to predict the web pages (documents) belong to which category in other shopping websites, and calculate the precision and recall value to evaluate the performance of the web page prediction. Finally, we propose a classification-based greedy selection approach to select a best mapping category set between these different shopping websites. If we can find a good category mapping among related shopping webs databases, users will get complete and high-accuracy information quickly at a time.
|