On Ontology Learning from Learning Resources of HTML Pages

碩士 === 銘傳大學 === 資訊管理學系碩士班 === 92 ===   Due to the abundancy and versatileness of resources in the highly developed Internet, it is not easy to find the right resources in the Internet. One important reason is the keywords used in most search engines show inconsistency among different domains and con...

Full description

Bibliographic Details
Main Authors: Wan-Shu Chao, 趙婉舒
Other Authors: 作者未提供
Format: Others
Language:zh-TW
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/wgp7ah
Description
Summary:碩士 === 銘傳大學 === 資訊管理學系碩士班 === 92 ===   Due to the abundancy and versatileness of resources in the highly developed Internet, it is not easy to find the right resources in the Internet. One important reason is the keywords used in most search engines show inconsistency among different domains and contexts. Fortunately, Ontology could eliminate this negative aspect by providing more meaningful semantics. However, building Ontology is a time-consuming and expensive task. Therefore, this thesis proposes a semi-automatic approach to building Ontology by designing a learning mechanism from resources of HTML pages. We use CKIP system to tag the Chinese part-of-speech and improve the TFIDF method by considering the weights of different HTML tag in retrieving important domain terms. Furthermore, we propose a two-level term-finding algorithm that discovers important terms (concepts and relations) not only from the single domain but also form the across-domain. A set of Chinese heuristic grammar rules are developed to extract the “is-a” relation between concepts to establish the concept hierarchy and other concept relations. A relation clustering method is also proposed to establish the relation hierarchy. Finally, the preliminary experiment showed that the built Ontology is satisfactory according to the evaluation of three human experts.