Automatic Web Document Classification Based on Genetic Algorithm
碩士 === 中華大學 === 資訊工程學系碩士班 === 92 === In this thesis we studied the web document classification method because of the web documents exponential growth day by day. We constructed a model based on Genetic Algorithm to choose the best threshold values of the condition parameters to sieve out suitable ke...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2004
|
Online Access: | http://ndltd.ncl.edu.tw/handle/68396692157353824068 |
id |
ndltd-TW-092CHPI0392006 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-092CHPI03920062016-01-04T04:08:39Z http://ndltd.ncl.edu.tw/handle/68396692157353824068 Automatic Web Document Classification Based on Genetic Algorithm 以基因演算法為基礎建立網頁自動分類機制 Ya-Hui Chen 陳雅慧 碩士 中華大學 資訊工程學系碩士班 92 In this thesis we studied the web document classification method because of the web documents exponential growth day by day. We constructed a model based on Genetic Algorithm to choose the best threshold values of the condition parameters to sieve out suitable keywords in order to improve the “trial-and error” method used in other theses. In order to prove the threshold chosen values to be optimal, we applied both Vector Space Model (VSM) and Support Vector Machine Model (SVM) in order to classify the documents by those chosen keywords and analyze the classification results of "recall rate" and "precision rate". We also compare the classification results among single condition parameter and multiple conditions parameters. The major problem is the repulsion between "recall rate" and "precision rate" because both values are important to the users. According to the analysis of the classification results, we can find the threshold values of condition parameters derived by the GA model that can have the best classification results. Finally, this study uses the threshold values of four condition parameters to sieve out suitable keywords, and can increase both "recall rate" and "precision rate". Prof. Chih-Hsun Chou 周智勳 2004 學位論文 ; thesis 110 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 中華大學 === 資訊工程學系碩士班 === 92 === In this thesis we studied the web document classification method because of the web documents exponential growth day by day. We constructed a model based on Genetic Algorithm to choose the best threshold values of the condition parameters to sieve out suitable keywords in order to improve the “trial-and error” method used in other theses. In order to prove the threshold chosen values to be optimal, we applied both Vector Space Model (VSM) and Support Vector Machine Model (SVM) in order to classify the documents by those chosen keywords and analyze the classification results of "recall rate" and "precision rate". We also compare the classification results among single condition parameter and multiple conditions parameters. The major problem is the repulsion between "recall rate" and "precision rate" because both values are important to the users. According to the analysis of the classification results, we can find the threshold values of condition parameters derived by the GA model that can have the best classification results. Finally, this study uses the threshold values of four condition parameters to sieve out suitable keywords, and can increase both "recall rate" and "precision rate".
|
author2 |
Prof. Chih-Hsun Chou |
author_facet |
Prof. Chih-Hsun Chou Ya-Hui Chen 陳雅慧 |
author |
Ya-Hui Chen 陳雅慧 |
spellingShingle |
Ya-Hui Chen 陳雅慧 Automatic Web Document Classification Based on Genetic Algorithm |
author_sort |
Ya-Hui Chen |
title |
Automatic Web Document Classification Based on Genetic Algorithm |
title_short |
Automatic Web Document Classification Based on Genetic Algorithm |
title_full |
Automatic Web Document Classification Based on Genetic Algorithm |
title_fullStr |
Automatic Web Document Classification Based on Genetic Algorithm |
title_full_unstemmed |
Automatic Web Document Classification Based on Genetic Algorithm |
title_sort |
automatic web document classification based on genetic algorithm |
publishDate |
2004 |
url |
http://ndltd.ncl.edu.tw/handle/68396692157353824068 |
work_keys_str_mv |
AT yahuichen automaticwebdocumentclassificationbasedongeneticalgorithm AT chényǎhuì automaticwebdocumentclassificationbasedongeneticalgorithm AT yahuichen yǐjīyīnyǎnsuànfǎwèijīchǔjiànlìwǎngyèzìdòngfēnlèijīzhì AT chényǎhuì yǐjīyīnyǎnsuànfǎwèijīchǔjiànlìwǎngyèzìdòngfēnlèijīzhì |
_version_ |
1718158821072830464 |