Mining of Frequent Embedded Unordered Subtrees

碩士 === 國立臺灣科技大學 === 資訊管理系 === 104 === Tree mining in data mining field has been a very popular research issue. This is because it can be applied to many kinds of tree-represented documents such as web logs, XMLdocuments, XBRL(eXtensible Business Reporting Language) documents for financial statements...

Full description

Bibliographic Details
Main Authors: Cheng-Jhe Li, 李承哲
Other Authors: Chiun-Chieh Hsu
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/30316068815480403208
id ndltd-TW-104NTUS5396057
record_format oai_dc
spelling ndltd-TW-104NTUS53960572017-10-29T04:35:14Z http://ndltd.ncl.edu.tw/handle/30316068815480403208 Mining of Frequent Embedded Unordered Subtrees 頻繁內嵌無序子樹探勘 Cheng-Jhe Li 李承哲 碩士 國立臺灣科技大學 資訊管理系 104 Tree mining in data mining field has been a very popular research issue. This is because it can be applied to many kinds of tree-represented documents such as web logs, XMLdocuments, XBRL(eXtensible Business Reporting Language) documents for financial statements, and other semi-structured documents, etc. However, how to efficiently find all frequent subtrees from a tree database has always been the main concern of many scholars. So the main challenges are how to avoid duplicate enumerated subtrees and identify potential frequent candidate subtrees in order to accelerate the mining. We choose rooted labeled unordered trees as mining trees stored in database, because labeled trees can fully represent real tree structure data, and rooted trees unlike unrooted trees need not find the center nodes in the trees as roots in order to create canonical forms. In addition, since ordered trees in the mining process is likely to miss some potential frequent subtrees, we consider unordered trees as target trees. We intend to mine embedded subtrees. This is because embedded subtrees are a generalization of induced subtrees; and they allow not only direct parent-child branches, but also ancestor-descendant branches. Therefore, mining embedded subtrees more difficult than mining induced subtrees. Hence, we propose the algorithm named MFEUT (Mining of Frequent Embedded Unordered subTrees) in order to effectively and efficiently solve the frequent embedded unordered subtrees mining problem. The candidate subtree generation method of MFEUT is based on equivalence class extension and uses growth restriction to avoid redundantly enumerated subtrees. In order to avoid too stringent growth restriction resulting in that some subtrees cannot be produced, we utilize extension operation and label range computing to ensure that all subtrees can be enumerated. In addition, we propose a new data structure called node info list for accelerating the computation of label range. As for candidate subtree pruning, we present a novel pruning strategy called FPC (Frequent Path Checking) in order to prune non-potential frequent candidate subtrees for accelerating the mining process. Our empirical study on synthetic and real datasets demonstrates that MFEUT achieves a substantial performance gain over SLEUTH、SLEUTH–F2SC and POTMiner. Chiun-Chieh Hsu 徐俊傑 2016 學位論文 ; thesis 65 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊管理系 === 104 === Tree mining in data mining field has been a very popular research issue. This is because it can be applied to many kinds of tree-represented documents such as web logs, XMLdocuments, XBRL(eXtensible Business Reporting Language) documents for financial statements, and other semi-structured documents, etc. However, how to efficiently find all frequent subtrees from a tree database has always been the main concern of many scholars. So the main challenges are how to avoid duplicate enumerated subtrees and identify potential frequent candidate subtrees in order to accelerate the mining. We choose rooted labeled unordered trees as mining trees stored in database, because labeled trees can fully represent real tree structure data, and rooted trees unlike unrooted trees need not find the center nodes in the trees as roots in order to create canonical forms. In addition, since ordered trees in the mining process is likely to miss some potential frequent subtrees, we consider unordered trees as target trees. We intend to mine embedded subtrees. This is because embedded subtrees are a generalization of induced subtrees; and they allow not only direct parent-child branches, but also ancestor-descendant branches. Therefore, mining embedded subtrees more difficult than mining induced subtrees. Hence, we propose the algorithm named MFEUT (Mining of Frequent Embedded Unordered subTrees) in order to effectively and efficiently solve the frequent embedded unordered subtrees mining problem. The candidate subtree generation method of MFEUT is based on equivalence class extension and uses growth restriction to avoid redundantly enumerated subtrees. In order to avoid too stringent growth restriction resulting in that some subtrees cannot be produced, we utilize extension operation and label range computing to ensure that all subtrees can be enumerated. In addition, we propose a new data structure called node info list for accelerating the computation of label range. As for candidate subtree pruning, we present a novel pruning strategy called FPC (Frequent Path Checking) in order to prune non-potential frequent candidate subtrees for accelerating the mining process. Our empirical study on synthetic and real datasets demonstrates that MFEUT achieves a substantial performance gain over SLEUTH、SLEUTH–F2SC and POTMiner.
author2 Chiun-Chieh Hsu
author_facet Chiun-Chieh Hsu
Cheng-Jhe Li
李承哲
author Cheng-Jhe Li
李承哲
spellingShingle Cheng-Jhe Li
李承哲
Mining of Frequent Embedded Unordered Subtrees
author_sort Cheng-Jhe Li
title Mining of Frequent Embedded Unordered Subtrees
title_short Mining of Frequent Embedded Unordered Subtrees
title_full Mining of Frequent Embedded Unordered Subtrees
title_fullStr Mining of Frequent Embedded Unordered Subtrees
title_full_unstemmed Mining of Frequent Embedded Unordered Subtrees
title_sort mining of frequent embedded unordered subtrees
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/30316068815480403208
work_keys_str_mv AT chengjheli miningoffrequentembeddedunorderedsubtrees
AT lǐchéngzhé miningoffrequentembeddedunorderedsubtrees
AT chengjheli pínfánnèiqiànwúxùzishùtànkān
AT lǐchéngzhé pínfánnèiqiànwúxùzishùtànkān
_version_ 1718558463067422720