A Research on Applying Term Frequency to Improve Multimembership Bayesian Theorem on Document Classification

碩士 === 中國文化大學 === 資訊管理學系 === 100 === In recent years, multimembership Bayesian (MMB) has had a wide application for medical, website, E-mail and other document processing use the practices list above utilize the automatic classification and knowledge inference function of MMB to im-prove efficie...

Full description

Bibliographic Details
Main Authors: Lo, Jen-Chun, 羅仁君
Other Authors: Wu, Homer
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/04308944692980294437
Description
Summary:碩士 === 中國文化大學 === 資訊管理學系 === 100 === In recent years, multimembership Bayesian (MMB) has had a wide application for medical, website, E-mail and other document processing use the practices list above utilize the automatic classification and knowledge inference function of MMB to im-prove efficiency. Given MMB’s practicality and popularity across all walks of lives, the research around MMB remains a constant focus academically. To further improve the strength of MMB’s core automatic document classification function, our study proposes the additional application of genetic algorithm before traditional MMB. Based on the law of probability, the extra step of genetic algorithm helps develop the "automatic adaptable screening threshold" mathematically, thus with more accuracy. Such calculation pin-points the significant, frequently-used words to form the threshold for further MMB classification. Since the application of genetic algorithm acts as the in-itial screen, consequently, the extracted leftovers are more precise for any document classification. Further, based on the mathematic results, the threshold leads to automatic assessment, which selects the most desirable word choice automatically. The research presents significantly improved results. When class differences are relatively in large degree, its classification accuracy achieves 83.93%. Even when class differences are to a lesser extent, its classification accuracy rate of is able to reach 70.60%.