Summary: | 碩士 === 中華大學 === 資訊工程學系碩士班 === 92 === In this thesis we studied the web document classification method because of the web documents exponential growth day by day. We constructed a model based on Genetic Algorithm to choose the best threshold values of the condition parameters to sieve out suitable keywords in order to improve the “trial-and error” method used in other theses. In order to prove the threshold chosen values to be optimal, we applied both Vector Space Model (VSM) and Support Vector Machine Model (SVM) in order to classify the documents by those chosen keywords and analyze the classification results of "recall rate" and "precision rate". We also compare the classification results among single condition parameter and multiple conditions parameters. The major problem is the repulsion between "recall rate" and "precision rate" because both values are important to the users. According to the analysis of the classification results, we can find the threshold values of condition parameters derived by the GA model that can have the best classification results. Finally, this study uses the threshold values of four condition parameters to sieve out suitable keywords, and can increase both "recall rate" and "precision rate".
|