A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering
博士 === 國立中央大學 === 資訊管理研究所 === 99 === With the continual improvement in internet-related technology, more and more information, especially text-based information, becomes available online. The implementation of most of these techniques draws upon Salton’s vector space model (VSM) in which documents o...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/03233117926847532638 |
id |
ndltd-TW-099NCU05396056 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-099NCU053960562017-07-13T04:20:27Z http://ndltd.ncl.edu.tw/handle/03233117926847532638 A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering 以形式概念分析為基礎之文件向量模型建立方式及其於文件分群之應用 Chin-Yi Cheng 鄭敬譯 博士 國立中央大學 資訊管理研究所 99 With the continual improvement in internet-related technology, more and more information, especially text-based information, becomes available online. The implementation of most of these techniques draws upon Salton’s vector space model (VSM) in which documents or query strings are represented by vectors. Most implementations based on VSM employ the individual terms extracted from the documents or query strings as the dimensionalities of the vectors, and the frequency of terms appearing in the documents or query strings as the value of the dimensionalities. These implementations, or so-called bag-of-terms methods, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms that have been proven capable of improving the effectiveness of information retrieval, document classification and document clustering. To deal with the problem of an automatically- constructed thesaurus for a given document, in this study, we apply FCA to construct the term ontology to deal with the hierarchical conceptual relationships together with synonym-like relationships for the document set. We also develop a document representation method that applies ontology to represent documents by concept-based vectors. In order to evaluate the usability and effectiveness of our method, we make use of document clustering as the application used to evaluate the generated concept-based vectors. Shihchieh Chou 周世傑 2011 學位論文 ; thesis 73 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立中央大學 === 資訊管理研究所 === 99 === With the continual improvement in internet-related technology, more and more information, especially text-based information, becomes available online. The implementation of most of these techniques draws upon Salton’s vector space model (VSM) in which documents or query strings are represented by vectors. Most implementations based on VSM employ the individual terms extracted from the documents or query strings as the dimensionalities of the vectors, and the frequency of terms appearing in the documents or query strings as the value of the dimensionalities. These implementations, or so-called bag-of-terms methods, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms that have been proven capable of improving the effectiveness of information retrieval, document classification and document clustering. To deal with the problem of an automatically- constructed thesaurus for a given document, in this study, we apply FCA to construct the term ontology to deal with the hierarchical conceptual relationships together with synonym-like relationships for the document set. We also develop a document representation method that applies ontology to represent documents by concept-based vectors. In order to evaluate the usability and effectiveness of our method, we make use of document clustering as the application used to evaluate the generated concept-based vectors.
|
author2 |
Shihchieh Chou |
author_facet |
Shihchieh Chou Chin-Yi Cheng 鄭敬譯 |
author |
Chin-Yi Cheng 鄭敬譯 |
spellingShingle |
Chin-Yi Cheng 鄭敬譯 A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering |
author_sort |
Chin-Yi Cheng |
title |
A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering |
title_short |
A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering |
title_full |
A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering |
title_fullStr |
A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering |
title_full_unstemmed |
A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering |
title_sort |
formal concept analysis-based document representation and its application on document clustering |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/03233117926847532638 |
work_keys_str_mv |
AT chinyicheng aformalconceptanalysisbaseddocumentrepresentationanditsapplicationondocumentclustering AT zhèngjìngyì aformalconceptanalysisbaseddocumentrepresentationanditsapplicationondocumentclustering AT chinyicheng yǐxíngshìgàiniànfēnxīwèijīchǔzhīwénjiànxiàngliàngmóxíngjiànlìfāngshìjíqíyúwénjiànfēnqúnzhīyīngyòng AT zhèngjìngyì yǐxíngshìgàiniànfēnxīwèijīchǔzhīwénjiànxiàngliàngmóxíngjiànlìfāngshìjíqíyúwénjiànfēnqúnzhīyīngyòng AT chinyicheng formalconceptanalysisbaseddocumentrepresentationanditsapplicationondocumentclustering AT zhèngjìngyì formalconceptanalysisbaseddocumentrepresentationanditsapplicationondocumentclustering |
_version_ |
1718495402094755840 |