Integration of Ontology and Semantic Similarity for extracting Keywords from Documents
碩士 === 中原大學 === 資訊工程研究所 === 102 === A document may have a large number of words, but it can have only some keywords which describe content of the document. According to these keywords, we can also distinguish the type of the document. Then, these keywords need a sequence of extracting method to get...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2014
|
Online Access: | http://ndltd.ncl.edu.tw/handle/46942011593856699312 |
id |
ndltd-TW-102CYCU5392030 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-102CYCU53920302015-10-13T23:49:49Z http://ndltd.ncl.edu.tw/handle/46942011593856699312 Integration of Ontology and Semantic Similarity for extracting Keywords from Documents 結合本體論與語意相似程度對文件萃取關鍵字 Yuan-Lin Chen 陳宛琳 碩士 中原大學 資訊工程研究所 102 A document may have a large number of words, but it can have only some keywords which describe content of the document. According to these keywords, we can also distinguish the type of the document. Then, these keywords need a sequence of extracting method to get them. In this thesis, an approach to extracting keywords from documents by combing knowledge in Ontology and sematic similarity was presented. We can find all knowledge which is described of words by Ontology, and then select more suitable knowledge through the calculation method of sematic similarity. By this collocation, we can find keywords from documents. First, we use Lucene, which is a tool for full-text search, to get words from the content of the document and to remove stop words. A two stage Stemming method is used to stem words to their root forms. The words are tagged using POS Tagger. The meaning of the words are obtained by searching the computed using Lin's sematic similarity. Finally, a subset of keywords are selected by using the domain Ontology information. Chung-Shyan Liu 留忠賢 2014 學位論文 ; thesis 86 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 中原大學 === 資訊工程研究所 === 102 === A document may have a large number of words, but it can have only some keywords which describe content of the document. According to these keywords, we can also distinguish the type of the document. Then, these keywords need a sequence of extracting method to get them.
In this thesis, an approach to extracting keywords from documents by combing knowledge in Ontology and sematic similarity was presented. We can find all knowledge which is described of words by Ontology, and then select more suitable knowledge through the calculation method of sematic similarity. By this collocation, we can find keywords from documents.
First, we use Lucene, which is a tool for full-text search, to get words from the content of the document and to remove stop words. A two stage Stemming method is used to stem words to their root forms. The words are tagged using POS Tagger. The meaning of the words are obtained by searching the computed using Lin's sematic similarity. Finally, a subset of keywords are selected by using the domain Ontology information.
|
author2 |
Chung-Shyan Liu |
author_facet |
Chung-Shyan Liu Yuan-Lin Chen 陳宛琳 |
author |
Yuan-Lin Chen 陳宛琳 |
spellingShingle |
Yuan-Lin Chen 陳宛琳 Integration of Ontology and Semantic Similarity for extracting Keywords from Documents |
author_sort |
Yuan-Lin Chen |
title |
Integration of Ontology and Semantic Similarity for extracting Keywords from Documents |
title_short |
Integration of Ontology and Semantic Similarity for extracting Keywords from Documents |
title_full |
Integration of Ontology and Semantic Similarity for extracting Keywords from Documents |
title_fullStr |
Integration of Ontology and Semantic Similarity for extracting Keywords from Documents |
title_full_unstemmed |
Integration of Ontology and Semantic Similarity for extracting Keywords from Documents |
title_sort |
integration of ontology and semantic similarity for extracting keywords from documents |
publishDate |
2014 |
url |
http://ndltd.ncl.edu.tw/handle/46942011593856699312 |
work_keys_str_mv |
AT yuanlinchen integrationofontologyandsemanticsimilarityforextractingkeywordsfromdocuments AT chénwǎnlín integrationofontologyandsemanticsimilarityforextractingkeywordsfromdocuments AT yuanlinchen jiéhéběntǐlùnyǔyǔyìxiāngshìchéngdùduìwénjiàncuìqǔguānjiànzì AT chénwǎnlín jiéhéběntǐlùnyǔyǔyìxiāngshìchéngdùduìwénjiàncuìqǔguānjiànzì |
_version_ |
1718086931849412608 |