A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents

碩士 === 國立高雄應用科技大學 === 電機工程系碩士班 === 93 === ABSTRACT : Measuring “Semantic Relatedness” among documents could help us understand the relatedness and similarity among documents in the information source, in order to further obtain the acquisition and extension of knowledge. Generally speaking, most of...

Full description

Bibliographic Details
Main Authors: Feng-Chih Hsu, 徐豐智
Other Authors: Chung-Hong Lee
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/65347825917480062899
id ndltd-TW-093KUAS0442003
record_format oai_dc
spelling ndltd-TW-093KUAS04420032015-10-13T11:39:19Z http://ndltd.ncl.edu.tw/handle/65347825917480062899 A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents SupportVectorMachines分類技術應用於文件相關性量測之探討 Feng-Chih Hsu 徐豐智 碩士 國立高雄應用科技大學 電機工程系碩士班 93 ABSTRACT : Measuring “Semantic Relatedness” among documents could help us understand the relatedness and similarity among documents in the information source, in order to further obtain the acquisition and extension of knowledge. Generally speaking, most of related work performing measures of semantic relatedness among documents largely employed a specific knowledge source or semantic network (e.g. WordNet ) for evaluation. According to the documents fall in the node positions of the semantic network, the distance between node and node, representing the degree of the relatedness between two documents can be calculated. In this work, we propose a novel method and platform to perform “measures of semantic relatedness” among texts. With the performance superiority of Support Vector Machines (SVMs) techniques in text categorization, we employed the SVMs approach to support evaluating of text relatedness. In this research, we established an innovative model, so called semantics-based vector space model, to deal with text evaluation. According to the decisions made by several trained SVMs classifiers, the feature vectors of original texts which are represented by terms can be transformed into semantic vectors represented by a combination of categories of SVMs classifiers. Through the vector measuring techniques including Distance, Cosine, Dice, and Jaccard methods, we obtain the computational results indicating the quantization of measures of semantic relatedness between two documents. According to the experimental results, the results of our proposed measuring techniques are consistent with human judgments. In addition, in terms of distinguishing the sub-themes of text categories, the performance with our novel vector method was better than those of traditional keyword based feature representation. Chung-Hong Lee 李俊宏 2005 學位論文 ; thesis 101 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立高雄應用科技大學 === 電機工程系碩士班 === 93 === ABSTRACT : Measuring “Semantic Relatedness” among documents could help us understand the relatedness and similarity among documents in the information source, in order to further obtain the acquisition and extension of knowledge. Generally speaking, most of related work performing measures of semantic relatedness among documents largely employed a specific knowledge source or semantic network (e.g. WordNet ) for evaluation. According to the documents fall in the node positions of the semantic network, the distance between node and node, representing the degree of the relatedness between two documents can be calculated. In this work, we propose a novel method and platform to perform “measures of semantic relatedness” among texts. With the performance superiority of Support Vector Machines (SVMs) techniques in text categorization, we employed the SVMs approach to support evaluating of text relatedness. In this research, we established an innovative model, so called semantics-based vector space model, to deal with text evaluation. According to the decisions made by several trained SVMs classifiers, the feature vectors of original texts which are represented by terms can be transformed into semantic vectors represented by a combination of categories of SVMs classifiers. Through the vector measuring techniques including Distance, Cosine, Dice, and Jaccard methods, we obtain the computational results indicating the quantization of measures of semantic relatedness between two documents. According to the experimental results, the results of our proposed measuring techniques are consistent with human judgments. In addition, in terms of distinguishing the sub-themes of text categories, the performance with our novel vector method was better than those of traditional keyword based feature representation.
author2 Chung-Hong Lee
author_facet Chung-Hong Lee
Feng-Chih Hsu
徐豐智
author Feng-Chih Hsu
徐豐智
spellingShingle Feng-Chih Hsu
徐豐智
A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents
author_sort Feng-Chih Hsu
title A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents
title_short A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents
title_full A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents
title_fullStr A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents
title_full_unstemmed A Study on Applying Support Vector Machines based Categorization Techniques to Measuring Relatedness between Documents
title_sort study on applying support vector machines based categorization techniques to measuring relatedness between documents
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/65347825917480062899
work_keys_str_mv AT fengchihhsu astudyonapplyingsupportvectormachinesbasedcategorizationtechniquestomeasuringrelatednessbetweendocuments
AT xúfēngzhì astudyonapplyingsupportvectormachinesbasedcategorizationtechniquestomeasuringrelatednessbetweendocuments
AT fengchihhsu supportvectormachinesfēnlèijìshùyīngyòngyúwénjiànxiāngguānxìngliàngcèzhītàntǎo
AT xúfēngzhì supportvectormachinesfēnlèijìshùyīngyòngyúwénjiànxiāngguānxìngliàngcèzhītàntǎo
AT fengchihhsu studyonapplyingsupportvectormachinesbasedcategorizationtechniquestomeasuringrelatednessbetweendocuments
AT xúfēngzhì studyonapplyingsupportvectormachinesbasedcategorizationtechniquestomeasuringrelatednessbetweendocuments
_version_ 1716846494589386752