Mining of Frequent Embedded Unordered Subtrees

碩士 === 國立臺灣科技大學 === 資訊管理系 === 104 === Tree mining in data mining field has been a very popular research issue. This is because it can be applied to many kinds of tree-represented documents such as web logs, XMLdocuments, XBRL(eXtensible Business Reporting Language) documents for financial statements...

Full description

Bibliographic Details
Main Authors:	Cheng-Jhe Li, 李承哲
Other Authors:	Chiun-Chieh Hsu
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/30316068815480403208

id	ndltd-TW-104NTUS5396057
record_format	oai_dc
spelling	ndltd-TW-104NTUS53960572017-10-29T04:35:14Z http://ndltd.ncl.edu.tw/handle/30316068815480403208 Mining of Frequent Embedded Unordered Subtrees 頻繁內嵌無序子樹探勘 Cheng-Jhe Li 李承哲碩士國立臺灣科技大學資訊管理系 104 Tree mining in data mining field has been a very popular research issue. This is because it can be applied to many kinds of tree-represented documents such as web logs, XMLdocuments, XBRL(eXtensible Business Reporting Language) documents for financial statements, and other semi-structured documents, etc. However, how to efficiently find all frequent subtrees from a tree database has always been the main concern of many scholars. So the main challenges are how to avoid duplicate enumerated subtrees and identify potential frequent candidate subtrees in order to accelerate the mining. We choose rooted labeled unordered trees as mining trees stored in database, because labeled trees can fully represent real tree structure data, and rooted trees unlike unrooted trees need not find the center nodes in the trees as roots in order to create canonical forms. In addition, since ordered trees in the mining process is likely to miss some potential frequent subtrees, we consider unordered trees as target trees. We intend to mine embedded subtrees. This is because embedded subtrees are a generalization of induced subtrees; and they allow not only direct parent-child branches, but also ancestor-descendant branches. Therefore, mining embedded subtrees more difficult than mining induced subtrees. Hence, we propose the algorithm named MFEUT (Mining of Frequent Embedded Unordered subTrees) in order to effectively and efficiently solve the frequent embedded unordered subtrees mining problem. The candidate subtree generation method of MFEUT is based on equivalence class extension and uses growth restriction to avoid redundantly enumerated subtrees. In order to avoid too stringent growth restriction resulting in that some subtrees cannot be produced, we utilize extension operation and label range computing to ensure that all subtrees can be enumerated. In addition, we propose a new data structure called node info list for accelerating the computation of label range. As for candidate subtree pruning, we present a novel pruning strategy called FPC (Frequent Path Checking) in order to prune non-potential frequent candidate subtrees for accelerating the mining process. Our empirical study on synthetic and real datasets demonstrates that MFEUT achieves a substantial performance gain over SLEUTH、SLEUTH–F2SC and POTMiner. Chiun-Chieh Hsu 徐俊傑 2016 學位論文 ; thesis 65 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣科技大學 === 資訊管理系 === 104 === Tree mining in data mining field has been a very popular research issue. This is because it can be applied to many kinds of tree-represented documents such as web logs, XMLdocuments, XBRL(eXtensible Business Reporting Language) documents for financial statements, and other semi-structured documents, etc. However, how to efficiently find all frequent subtrees from a tree database has always been the main concern of many scholars. So the main challenges are how to avoid duplicate enumerated subtrees and identify potential frequent candidate subtrees in order to accelerate the mining. We choose rooted labeled unordered trees as mining trees stored in database, because labeled trees can fully represent real tree structure data, and rooted trees unlike unrooted trees need not find the center nodes in the trees as roots in order to create canonical forms. In addition, since ordered trees in the mining process is likely to miss some potential frequent subtrees, we consider unordered trees as target trees. We intend to mine embedded subtrees. This is because embedded subtrees are a generalization of induced subtrees; and they allow not only direct parent-child branches, but also ancestor-descendant branches. Therefore, mining embedded subtrees more difficult than mining induced subtrees. Hence, we propose the algorithm named MFEUT (Mining of Frequent Embedded Unordered subTrees) in order to effectively and efficiently solve the frequent embedded unordered subtrees mining problem. The candidate subtree generation method of MFEUT is based on equivalence class extension and uses growth restriction to avoid redundantly enumerated subtrees. In order to avoid too stringent growth restriction resulting in that some subtrees cannot be produced, we utilize extension operation and label range computing to ensure that all subtrees can be enumerated. In addition, we propose a new data structure called node info list for accelerating the computation of label range. As for candidate subtree pruning, we present a novel pruning strategy called FPC (Frequent Path Checking) in order to prune non-potential frequent candidate subtrees for accelerating the mining process. Our empirical study on synthetic and real datasets demonstrates that MFEUT achieves a substantial performance gain over SLEUTH、SLEUTH–F2SC and POTMiner.
author2	Chiun-Chieh Hsu
author_facet	Chiun-Chieh Hsu Cheng-Jhe Li 李承哲
author	Cheng-Jhe Li 李承哲
spellingShingle	Cheng-Jhe Li 李承哲 Mining of Frequent Embedded Unordered Subtrees
author_sort	Cheng-Jhe Li
title	Mining of Frequent Embedded Unordered Subtrees
title_short	Mining of Frequent Embedded Unordered Subtrees
title_full	Mining of Frequent Embedded Unordered Subtrees
title_fullStr	Mining of Frequent Embedded Unordered Subtrees
title_full_unstemmed	Mining of Frequent Embedded Unordered Subtrees
title_sort	mining of frequent embedded unordered subtrees
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/30316068815480403208
work_keys_str_mv	AT chengjheli miningoffrequentembeddedunorderedsubtrees AT lǐchéngzhé miningoffrequentembeddedunorderedsubtrees AT chengjheli pínfánnèiqiànwúxùzishùtànkān AT lǐchéngzhé pínfánnèiqiànwúxùzishùtànkān
_version_	1718558463067422720

Mining of Frequent Embedded Unordered Subtrees

Similar Items