Conceptual Text Mining with Hierarchical Knowledge Structures

博士 === 國立成功大學 === 資訊管理研究所 === 101 === Text mining is a critical technique to manage huge collections of documents. However, most existing text mining algorithms are easily affected by ambiguous terms. The ability to disambiguate for a classifier is thus as important as the ability to classify accura...

Full description

Bibliographic Details
Main Authors: Fu-ChingTsai, 蔡馥璟
Other Authors: Sheng-Tun Li
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/64633430782280228010
id ndltd-TW-101NCKU5396001
record_format oai_dc
spelling ndltd-TW-101NCKU53960012015-10-13T22:01:27Z http://ndltd.ncl.edu.tw/handle/64633430782280228010 Conceptual Text Mining with Hierarchical Knowledge Structures 結合階層式知識結構之文本分析 Fu-ChingTsai 蔡馥璟 博士 國立成功大學 資訊管理研究所 101 Text mining is a critical technique to manage huge collections of documents. However, most existing text mining algorithms are easily affected by ambiguous terms. The ability to disambiguate for a classifier is thus as important as the ability to classify accurately. Knowledge structure (KS) has proven to be efficient in discovering the hidden structural relations and implications of knowledge, thus significant reasoning patterns are retrieved to enhance the efficiency of text analysis. In this research, we proposed a conceptual text mining framework based on two hierarchical KS model, lattice and tree, to discover the efficiency of incorporating hierarchical KS for retrieving context from corpus in text mining tasks. The first model is based on fuzzy formal concept analysis to conceptualize documents into a more abstract form of concepts, and use these as the training examples to alleviate the arbitrary outcomes caused by ambiguous terms. The proposed model is evaluated on a benchmark testbed and two opinion polarity datasets. The experimental results indicate superior performance in all datasets. Applying concept analysis to opinion polarity classification is a leading endeavor in the disambiguation of Web 2.0 contents, and the approach presented in this paper offers significant improvements on current methods. The results of the proposed model reveal its ability to decrease the sensitivity to noise, as well as its adaptability in cross domain applications. However, the lattice-based model is suffered from highly computational complexity so as to limited in dealing with big data. To address this critical issue, we propose a new approach to construct a tree-based KS from corpus which can reveal the significant relations among knowledge objects and provide concise entity relations to avoid computation overload. The effectiveness of the second model is demonstrated with two representative public data sets. The evaluation results show that the method presented in this work achieves remarkable consistency with the domain-specific knowledge structure, and is capable of reflecting appropriate similarities among knowledge objects along with hierarchical implications in the document classification task. Sheng-Tun Li 李昇暾 2013 學位論文 ; thesis 55 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立成功大學 === 資訊管理研究所 === 101 === Text mining is a critical technique to manage huge collections of documents. However, most existing text mining algorithms are easily affected by ambiguous terms. The ability to disambiguate for a classifier is thus as important as the ability to classify accurately. Knowledge structure (KS) has proven to be efficient in discovering the hidden structural relations and implications of knowledge, thus significant reasoning patterns are retrieved to enhance the efficiency of text analysis. In this research, we proposed a conceptual text mining framework based on two hierarchical KS model, lattice and tree, to discover the efficiency of incorporating hierarchical KS for retrieving context from corpus in text mining tasks. The first model is based on fuzzy formal concept analysis to conceptualize documents into a more abstract form of concepts, and use these as the training examples to alleviate the arbitrary outcomes caused by ambiguous terms. The proposed model is evaluated on a benchmark testbed and two opinion polarity datasets. The experimental results indicate superior performance in all datasets. Applying concept analysis to opinion polarity classification is a leading endeavor in the disambiguation of Web 2.0 contents, and the approach presented in this paper offers significant improvements on current methods. The results of the proposed model reveal its ability to decrease the sensitivity to noise, as well as its adaptability in cross domain applications. However, the lattice-based model is suffered from highly computational complexity so as to limited in dealing with big data. To address this critical issue, we propose a new approach to construct a tree-based KS from corpus which can reveal the significant relations among knowledge objects and provide concise entity relations to avoid computation overload. The effectiveness of the second model is demonstrated with two representative public data sets. The evaluation results show that the method presented in this work achieves remarkable consistency with the domain-specific knowledge structure, and is capable of reflecting appropriate similarities among knowledge objects along with hierarchical implications in the document classification task.
author2 Sheng-Tun Li
author_facet Sheng-Tun Li
Fu-ChingTsai
蔡馥璟
author Fu-ChingTsai
蔡馥璟
spellingShingle Fu-ChingTsai
蔡馥璟
Conceptual Text Mining with Hierarchical Knowledge Structures
author_sort Fu-ChingTsai
title Conceptual Text Mining with Hierarchical Knowledge Structures
title_short Conceptual Text Mining with Hierarchical Knowledge Structures
title_full Conceptual Text Mining with Hierarchical Knowledge Structures
title_fullStr Conceptual Text Mining with Hierarchical Knowledge Structures
title_full_unstemmed Conceptual Text Mining with Hierarchical Knowledge Structures
title_sort conceptual text mining with hierarchical knowledge structures
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/64633430782280228010
work_keys_str_mv AT fuchingtsai conceptualtextminingwithhierarchicalknowledgestructures
AT càifùjǐng conceptualtextminingwithhierarchicalknowledgestructures
AT fuchingtsai jiéhéjiēcéngshìzhīshíjiégòuzhīwénběnfēnxī
AT càifùjǐng jiéhéjiēcéngshìzhīshíjiégòuzhīwénběnfēnxī
_version_ 1718072397657014272