Summary: | 碩士 === 國立中央大學 === 資訊管理研究所 === 95 === Journal papers provide professional domain knowledge. Nevertheless, emerging of information overloading causes considerable cost of time. Application of text categorization technology could help users to retrieve domain journal papers efficiently. Four phases of text categorization process are “text pre-processing”, “document feature construction”, “applying classification methods” and “evaluation”. This research probes for the effectiveness of: feature weighting, fields of articles and classifiers during the process of journal papers categorization, and also applied sampling distribution classifier within the process. The hypothesis test analysis shows that: 1st, feature ratio performs well significantly than feature frequency. 2nd, fields of abstract are more effective than titles and keywords of journal papers, and there are no difference between the latter two. 3rd, Support vector machines are most effective, then naïve-bayes, decision trees and sampling distribution classifier in order. And 4th, text categorization of journal papers is feasible. Additionally, analysis and recommendation of sampling distribution classifier are also proposed for the future study.
|