Integration of Ontology and Semantic Similarity for extracting Keywords from Documents

碩士 === 中原大學 === 資訊工程研究所 === 102 === A document may have a large number of words, but it can have only some keywords which describe content of the document. According to these keywords, we can also distinguish the type of the document. Then, these keywords need a sequence of extracting method to get...

Full description

Bibliographic Details
Main Authors: Yuan-Lin Chen, 陳宛琳
Other Authors: Chung-Shyan Liu
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/46942011593856699312
id ndltd-TW-102CYCU5392030
record_format oai_dc
spelling ndltd-TW-102CYCU53920302015-10-13T23:49:49Z http://ndltd.ncl.edu.tw/handle/46942011593856699312 Integration of Ontology and Semantic Similarity for extracting Keywords from Documents 結合本體論與語意相似程度對文件萃取關鍵字 Yuan-Lin Chen 陳宛琳 碩士 中原大學 資訊工程研究所 102 A document may have a large number of words, but it can have only some keywords which describe content of the document. According to these keywords, we can also distinguish the type of the document. Then, these keywords need a sequence of extracting method to get them. In this thesis, an approach to extracting keywords from documents by combing knowledge in Ontology and sematic similarity was presented. We can find all knowledge which is described of words by Ontology, and then select more suitable knowledge through the calculation method of sematic similarity. By this collocation, we can find keywords from documents. First, we use Lucene, which is a tool for full-text search, to get words from the content of the document and to remove stop words. A two stage Stemming method is used to stem words to their root forms. The words are tagged using POS Tagger. The meaning of the words are obtained by searching the computed using Lin's sematic similarity. Finally, a subset of keywords are selected by using the domain Ontology information. Chung-Shyan Liu 留忠賢 2014 學位論文 ; thesis 86 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 中原大學 === 資訊工程研究所 === 102 === A document may have a large number of words, but it can have only some keywords which describe content of the document. According to these keywords, we can also distinguish the type of the document. Then, these keywords need a sequence of extracting method to get them. In this thesis, an approach to extracting keywords from documents by combing knowledge in Ontology and sematic similarity was presented. We can find all knowledge which is described of words by Ontology, and then select more suitable knowledge through the calculation method of sematic similarity. By this collocation, we can find keywords from documents. First, we use Lucene, which is a tool for full-text search, to get words from the content of the document and to remove stop words. A two stage Stemming method is used to stem words to their root forms. The words are tagged using POS Tagger. The meaning of the words are obtained by searching the computed using Lin's sematic similarity. Finally, a subset of keywords are selected by using the domain Ontology information.
author2 Chung-Shyan Liu
author_facet Chung-Shyan Liu
Yuan-Lin Chen
陳宛琳
author Yuan-Lin Chen
陳宛琳
spellingShingle Yuan-Lin Chen
陳宛琳
Integration of Ontology and Semantic Similarity for extracting Keywords from Documents
author_sort Yuan-Lin Chen
title Integration of Ontology and Semantic Similarity for extracting Keywords from Documents
title_short Integration of Ontology and Semantic Similarity for extracting Keywords from Documents
title_full Integration of Ontology and Semantic Similarity for extracting Keywords from Documents
title_fullStr Integration of Ontology and Semantic Similarity for extracting Keywords from Documents
title_full_unstemmed Integration of Ontology and Semantic Similarity for extracting Keywords from Documents
title_sort integration of ontology and semantic similarity for extracting keywords from documents
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/46942011593856699312
work_keys_str_mv AT yuanlinchen integrationofontologyandsemanticsimilarityforextractingkeywordsfromdocuments
AT chénwǎnlín integrationofontologyandsemanticsimilarityforextractingkeywordsfromdocuments
AT yuanlinchen jiéhéběntǐlùnyǔyǔyìxiāngshìchéngdùduìwénjiàncuìqǔguānjiànzì
AT chénwǎnlín jiéhéběntǐlùnyǔyǔyìxiāngshìchéngdùduìwénjiàncuìqǔguānjiànzì
_version_ 1718086931849412608