A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering

博士 === 國立中央大學 === 資訊管理研究所 === 99 === With the continual improvement in internet-related technology, more and more information, especially text-based information, becomes available online. The implementation of most of these techniques draws upon Salton’s vector space model (VSM) in which documents o...

Full description

Bibliographic Details
Main Authors: Chin-Yi Cheng, 鄭敬譯
Other Authors: Shihchieh Chou
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/03233117926847532638
id ndltd-TW-099NCU05396056
record_format oai_dc
spelling ndltd-TW-099NCU053960562017-07-13T04:20:27Z http://ndltd.ncl.edu.tw/handle/03233117926847532638 A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering 以形式概念分析為基礎之文件向量模型建立方式及其於文件分群之應用 Chin-Yi Cheng 鄭敬譯 博士 國立中央大學 資訊管理研究所 99 With the continual improvement in internet-related technology, more and more information, especially text-based information, becomes available online. The implementation of most of these techniques draws upon Salton’s vector space model (VSM) in which documents or query strings are represented by vectors. Most implementations based on VSM employ the individual terms extracted from the documents or query strings as the dimensionalities of the vectors, and the frequency of terms appearing in the documents or query strings as the value of the dimensionalities. These implementations, or so-called bag-of-terms methods, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms that have been proven capable of improving the effectiveness of information retrieval, document classification and document clustering. To deal with the problem of an automatically- constructed thesaurus for a given document, in this study, we apply FCA to construct the term ontology to deal with the hierarchical conceptual relationships together with synonym-like relationships for the document set. We also develop a document representation method that applies ontology to represent documents by concept-based vectors. In order to evaluate the usability and effectiveness of our method, we make use of document clustering as the application used to evaluate the generated concept-based vectors. Shihchieh Chou 周世傑 2011 學位論文 ; thesis 73 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立中央大學 === 資訊管理研究所 === 99 === With the continual improvement in internet-related technology, more and more information, especially text-based information, becomes available online. The implementation of most of these techniques draws upon Salton’s vector space model (VSM) in which documents or query strings are represented by vectors. Most implementations based on VSM employ the individual terms extracted from the documents or query strings as the dimensionalities of the vectors, and the frequency of terms appearing in the documents or query strings as the value of the dimensionalities. These implementations, or so-called bag-of-terms methods, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms that have been proven capable of improving the effectiveness of information retrieval, document classification and document clustering. To deal with the problem of an automatically- constructed thesaurus for a given document, in this study, we apply FCA to construct the term ontology to deal with the hierarchical conceptual relationships together with synonym-like relationships for the document set. We also develop a document representation method that applies ontology to represent documents by concept-based vectors. In order to evaluate the usability and effectiveness of our method, we make use of document clustering as the application used to evaluate the generated concept-based vectors.
author2 Shihchieh Chou
author_facet Shihchieh Chou
Chin-Yi Cheng
鄭敬譯
author Chin-Yi Cheng
鄭敬譯
spellingShingle Chin-Yi Cheng
鄭敬譯
A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering
author_sort Chin-Yi Cheng
title A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering
title_short A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering
title_full A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering
title_fullStr A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering
title_full_unstemmed A Formal Concept Analysis-Based Document Representation and its Application on Document Clustering
title_sort formal concept analysis-based document representation and its application on document clustering
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/03233117926847532638
work_keys_str_mv AT chinyicheng aformalconceptanalysisbaseddocumentrepresentationanditsapplicationondocumentclustering
AT zhèngjìngyì aformalconceptanalysisbaseddocumentrepresentationanditsapplicationondocumentclustering
AT chinyicheng yǐxíngshìgàiniànfēnxīwèijīchǔzhīwénjiànxiàngliàngmóxíngjiànlìfāngshìjíqíyúwénjiànfēnqúnzhīyīngyòng
AT zhèngjìngyì yǐxíngshìgàiniànfēnxīwèijīchǔzhīwénjiànxiàngliàngmóxíngjiànlìfāngshìjíqíyúwénjiànfēnqúnzhīyīngyòng
AT chinyicheng formalconceptanalysisbaseddocumentrepresentationanditsapplicationondocumentclustering
AT zhèngjìngyì formalconceptanalysisbaseddocumentrepresentationanditsapplicationondocumentclustering
_version_ 1718495402094755840