Summary: | 碩士 === 國立交通大學 === 資訊科學系 === 87 === To manage a huge amount of documents easily and efficiently, document classification is important in information retrieval. One of the factors to affect document classification performance is class descriptor. The traditional method of extracting class descriptors is to union all descriptors in the same class to express class. This method results in a large number of class descriptor, low similarity between document descriptors and class descriptors, and much computing time. Hence, we propose the Ga-based model, which combines the concepts in information retrieval (like similarity, weighted and hit ratio) with characteristics of genetic algorithm (like exploration and exploitation) to extract suitable class descriptors. The experimental results indicate that the proposed model with the first fitness functio extracts class descriptor with higher similarity between document descriptor, and less space overheads.
|