A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents
碩士 === 國立高雄應用科技大學 === 電機工程系碩士班 === 93 === ABSTRACT : Measuring “Semantic Relatedness” among documents could help us understand the relatedness and similarity among documents in the information source, in order to further obtain the acquisition and extension of knowledge. Generally speaking, most of...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2005
|
Online Access: | http://ndltd.ncl.edu.tw/handle/65347825917480062899 |
id |
ndltd-TW-093KUAS0442003 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-093KUAS04420032015-10-13T11:39:19Z http://ndltd.ncl.edu.tw/handle/65347825917480062899 A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents SupportVectorMachines分類技術應用於文件相關性量測之探討 Feng-Chih Hsu 徐豐智 碩士 國立高雄應用科技大學 電機工程系碩士班 93 ABSTRACT : Measuring “Semantic Relatedness” among documents could help us understand the relatedness and similarity among documents in the information source, in order to further obtain the acquisition and extension of knowledge. Generally speaking, most of related work performing measures of semantic relatedness among documents largely employed a specific knowledge source or semantic network (e.g. WordNet ) for evaluation. According to the documents fall in the node positions of the semantic network, the distance between node and node, representing the degree of the relatedness between two documents can be calculated. In this work, we propose a novel method and platform to perform “measures of semantic relatedness” among texts. With the performance superiority of Support Vector Machines (SVMs) techniques in text categorization, we employed the SVMs approach to support evaluating of text relatedness. In this research, we established an innovative model, so called semantics-based vector space model, to deal with text evaluation. According to the decisions made by several trained SVMs classifiers, the feature vectors of original texts which are represented by terms can be transformed into semantic vectors represented by a combination of categories of SVMs classifiers. Through the vector measuring techniques including Distance, Cosine, Dice, and Jaccard methods, we obtain the computational results indicating the quantization of measures of semantic relatedness between two documents. According to the experimental results, the results of our proposed measuring techniques are consistent with human judgments. In addition, in terms of distinguishing the sub-themes of text categories, the performance with our novel vector method was better than those of traditional keyword based feature representation. Chung-Hong Lee 李俊宏 2005 學位論文 ; thesis 101 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立高雄應用科技大學 === 電機工程系碩士班 === 93 === ABSTRACT : Measuring “Semantic Relatedness” among documents could help us understand the relatedness and similarity among documents in the information source, in order to further obtain the acquisition and extension of knowledge. Generally speaking, most of related work performing measures of semantic relatedness among documents largely employed a specific knowledge source or semantic network (e.g. WordNet ) for evaluation. According to the documents fall in the node positions of the semantic network, the distance between node and node, representing the degree of the relatedness between two documents can be calculated.
In this work, we propose a novel method and platform to perform “measures of semantic relatedness” among texts. With the performance superiority of Support Vector Machines (SVMs) techniques in text categorization, we employed the SVMs approach to support evaluating of text relatedness. In this research, we established an innovative model, so called semantics-based vector space model, to deal with text evaluation. According to the decisions made by several trained SVMs classifiers, the feature vectors of original texts which are represented by terms can be transformed into semantic vectors represented by a combination of categories of SVMs classifiers. Through the vector measuring techniques including Distance, Cosine, Dice, and Jaccard methods, we obtain the computational results indicating the quantization of measures of semantic relatedness between two documents. According to the experimental results, the results of our proposed measuring techniques are consistent with human judgments. In addition, in terms of distinguishing the sub-themes of text categories, the performance with our novel vector method was better than those of traditional keyword based feature representation.
|
author2 |
Chung-Hong Lee |
author_facet |
Chung-Hong Lee Feng-Chih Hsu 徐豐智 |
author |
Feng-Chih Hsu 徐豐智 |
spellingShingle |
Feng-Chih Hsu 徐豐智 A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents |
author_sort |
Feng-Chih Hsu |
title |
A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents |
title_short |
A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents |
title_full |
A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents |
title_fullStr |
A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents |
title_full_unstemmed |
A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents |
title_sort |
study on applying support vector machines based categorization techniques to measuring relatedness between documents |
publishDate |
2005 |
url |
http://ndltd.ncl.edu.tw/handle/65347825917480062899 |
work_keys_str_mv |
AT fengchihhsu astudyonapplyingsupportvectormachinesbasedcategorizationtechniquestomeasuringrelatednessbetweendocuments AT xúfēngzhì astudyonapplyingsupportvectormachinesbasedcategorizationtechniquestomeasuringrelatednessbetweendocuments AT fengchihhsu supportvectormachinesfēnlèijìshùyīngyòngyúwénjiànxiāngguānxìngliàngcèzhītàntǎo AT xúfēngzhì supportvectormachinesfēnlèijìshùyīngyòngyúwénjiànxiāngguānxìngliàngcèzhītàntǎo AT fengchihhsu studyonapplyingsupportvectormachinesbasedcategorizationtechniquestomeasuringrelatednessbetweendocuments AT xúfēngzhì studyonapplyingsupportvectormachinesbasedcategorizationtechniquestomeasuringrelatednessbetweendocuments |
_version_ |
1716846494589386752 |