Summary: | 碩士 === 國立高雄第一科技大學 === 資訊管理所 === 91 === Thanks to the proliferation of Internet, documents are rapidly shared over the cyberspace in the past few years. However, it also makes knowledge workers suffer from the information-overloading problem. For a set of documents without suitable categorization, the searching process will be prone to time-consuming. To overcome the problem, the issues about effective information retrieval have been studied extensively. The objective mainly focuses on how to match documents that are conforming to users’ requirements efficiently. Nevertheless, in the real world applications, people tend to use some ‘fuzzy’ terms to express their thinking and there are always ambiguity and uncertainty existing in the procedure. Beside, a document may involve various concerns, which makes it should be better categorized into multiple categories.
In this paper, we propose a multiple categorization approach based on fuzzy set theory to classify documents into multiple categories. Furthermore, by using the concepts of fuzzy correlation coefficient and the similarity tree graph, we can proceed to document similarity analysis and accomplish the work of hierarchical categorization. Finally we conduct experiments over a set of conference papers to verify the precision rate and recall rate of our approach compared with manual categorizations. Generally speaking, due to the complex semantics involved in various documents, manual categorization can realize the most suitable topic a document belongs if the content is carefully scanned or digested. However, the process is usually time-consuming, which makes most of the categorization tasks be simply based on the document titles or abstracts. Our approach can help to alleviate such problem and provide a multi-categorization solution for most of the text categorization applications. In the future, the result can be further utilized as a base for constructing a document warehouse by applying text-mining methods.
|