Summary: | This research investigates to what extent ontologies can be used to achieve an accurate classification performance of an automatic text classifier, called the Automatic Classification Engine (ACE). The task of the classifier is to classify Web pages with respect to the Dewey Decimal Classification (DOC) and Library of Congress Classification (LCC) schemes. In particular, this research focuses on how to 1. build a set of ontologies which can provide a mechanism to enable machine reasoning; 2. define the mappings between the ontologies and the two classification schemes; 3. implement an ontology-based classifier. The design and implementation of the classifier concentrates on developing an ontologybased classification model. Given a Web page, the classifier applies the model to carry out reasoning to determine terms - from within the Web page - which represent significant concepts. The classifier, then, uses the mappings to determine the associated DOC and LCC classes of the significant concepts, and assigns the DOC and LCC classes to the Web page. The research also investigates a number of approaches which can be applied to extend the coverage of the ontologies used in a semi-automatic way, since manually constructing ontologies is time consuming. The investigation leads to the design and implementation of a semi-automatic ontology construction system which can recognise new potential terms. By using an ontology editor, those new terms can be integrated into their associated ontologies. An experiment was conducted to validate the effectiveness of the classification model, in which the classifier classified a set of collections of Web pages. The performance of the classifier was measured, in terms of its coverage and accuracy. The experimental evidence shows that the ontology-based automatic text classification approach achieved a better level of performance over the existing approaches.
|