Summary: | 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 99 === Web activities and services are increasing rapidly. In recent years, predicting user intent most from relation between query keyword and queried result pages with search engine or portal. Analyzing users’ access data or activities on website can help web service provider to enhance the accuracy of query keyword’s result pages, to improve website’s performance by caching query keyword’s result pages and pre-fetch web pages, to improve web page recommendation system and web page ranking system personalization, to improve commercial advertisement for products and application to information filtering. So capture the context of user’s previous browsing behavior for predicting user intent is a very important issue and challenge.
Most studies are focus on user’s query keyword and relation between query keyword and next click pages in queried result page for predicting user intent. We implement two models, Top-Level Domain model(TLD) that trained by URL-based feature, Hidden Markov Model(HMM) that trained by context-aware category sequence from user’s browsing URLs. And we proposed a mixture model for combining TLD and HMM to predict category of user’s next access page. Also, to apply our proposed context-aware web page category prediction model to two filtering applications, i.e., objectionable web content filtering and web security threat prevention.
|