Retrieving Representative Structures from XML Documents Using Clustering Techniques

碩士 === 雲林科技大學 === 電子與資訊工程研究所 === 98 === In the paper, we addressed the problem of finding the common structures in a collection of XML documents. Since an XML document can be represented as a tree structure, the problem how to cluster a collection of XML documents can be considered as how to cluster...

Full description

Bibliographic Details
Main Authors: Po-Lun Liou, 劉博倫
Other Authors: Yin-Fu Huang
Format: Others
Language:en_US
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/89488738673347465186
Description
Summary:碩士 === 雲林科技大學 === 電子與資訊工程研究所 === 98 === In the paper, we addressed the problem of finding the common structures in a collection of XML documents. Since an XML document can be represented as a tree structure, the problem how to cluster a collection of XML documents can be considered as how to cluster a collection of tree-structured documents. First, we used SOM (Self-Organizing Map) with the Jaccard coefficient to cluster XML documents. Then, an efficient sequential mining method called GST was applied to find maximum frequent sequences. Finally, we merged the maximum frequent sequences to produce the common structures in a cluster.