Quality analyses and improvement for fuzzy clustering and web personalization

Web mining researchers and practitioners keep on innovating and creating new technologies to help web site managers efficiently improve their offered web-based services and to facilitate information retrieval by web site users. The increasing amount of information and services offered through the We...

Full description

Bibliographic Details
Main Author: Ketata, Amir
Format: Others
Published: 2009
Online Access:http://spectrum.library.concordia.ca/976323/1/MR63244.pdf
Ketata, Amir <http://spectrum.library.concordia.ca/view/creators/Ketata=3AAmir=3A=3A.html> (2009) Quality analyses and improvement for fuzzy clustering and web personalization. Masters thesis, Concordia University.
Description
Summary:Web mining researchers and practitioners keep on innovating and creating new technologies to help web site managers efficiently improve their offered web-based services and to facilitate information retrieval by web site users. The increasing amount of information and services offered through the Web coupled with the increase in web-based transactions calls for systems that can handle gigantic amount of usage information efficiently while providing good predictions or recommendations and personalization of web sites. In this thesis we first focus on clustering to obtain usage model from weblog data and investigate ways to improve the clustering quality. We also consider applications and focus on generating predictions through collaborative filtering which matches behavior of a current user with that of past like-minded users. To provide dependable performance analysis and improve clustering quality, we study 4 fuzzy clustering algorithms and compare their effectiveness and efficiency in web prediction. Dependability aspects led us further to investigate objectivity of validity indices and choose a more objective index for assessing the relative performance of the clustering techniques. We also use appropriate statistical testing methods in our experiments to distinguish real differences from those that may be due to sampling or other errors. Our results reconfirm some of the claims made previously about these clustering and prediction techniques, while at the same time suggest the need to assess both cluster validation and prediction quality for a sound comparison of the clustering techniques. To assess quality of aggregate usage profiles (UP), we devised a set of criteria which reflect the semantic characterization of UPs and help avoid resorting to subjective human judgment in assessment of UPs and clustering quality. We formulate each of these criteria as a computable measure for individual as well as for groups of UPs. We applied these criteria in the final phase of fuzzy clustering. The soundness and usability of the criteria have been confirmed through a user survey.