The Novel Twig-Join Algorithm with Structure for Efficient Retrieval of XML Documents

博士 === 國立中山大學 === 資訊工程學系研究所 === 104 === In recent years, XML (eXtensible Markup Language) has become the standard code for data representation and data exchange on the Internet. In addition, data queries have constantly increased; however, it is getting harder to query efficiently and obtain the pre...

Full description

Bibliographic Details
Main Authors: Yi-wei Kung, 龔奕瑋
Other Authors: Chungnan Lee
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/n986b7
id ndltd-TW-104NSYS5392036
record_format oai_dc
spelling ndltd-TW-104NSYS53920362019-05-15T23:01:39Z http://ndltd.ncl.edu.tw/handle/n986b7 The Novel Twig-Join Algorithm with Structure for Efficient Retrieval of XML Documents 新樹枝連結演算法結合結構對XML文件有效率的檢索 Yi-wei Kung 龔奕瑋 博士 國立中山大學 資訊工程學系研究所 104 In recent years, XML (eXtensible Markup Language) has become the standard code for data representation and data exchange on the Internet. In addition, data queries have constantly increased; however, it is getting harder to query efficiently and obtain the precise required results because of the huge amount of data. The main operation in XML query processing is finding nodes that match the given query tree pattern (QTP) in the document. The problem is that accessing too many useless nodes in order to match a query pattern is very time consuming. Meanwhile, the XML documents based on characteristics of structure can lead to results which lack clarity due to the query ambiguity of the structure. Therefore, determining how to ensure efficient query service based on a skillful representation that can support query diversification and solve ambiguity in order to achieve high precision search capability is an important issue. To overcome the time-consumption problem, we utilize the structural summary tree (SST) algorithm to optimize XML documents; the aim is to eliminate unnecessary paths that include nested structures and duplicate paths. The novel twig-join Swift (TJSwift) associated with adjacent linked (AL) lists for the provision of efficient XML query services is proposed herein, whereby queries can be versatile in terms of predicates. It can completely preserve hierarchical information, and the new index generated from SST is used to save semantic information in order to provide template-based indexing for fast data searches. At the same time, to the best of our knowledge, researches on query diversification, queries in single node and hierarchical level difference and intermediate nodes with ambiguity in regard to hierarchical level are insufficient. In terms of result relevance, effectiveness is the most crucial aspect of query search, which can be summarized as these issues. Hence, we also further propose extending twig join Swift (eXTJSwift) associated with AL lists to provide efficient XML query services, whereby queries can be versatile in terms of predicates. In order to evaluate the performance of the TJSwift and eXTJSwift approaches with that of TwigStack, TwigList and TJFast, we conducted two sets of performance evaluation. For the TJSwift, the performance evaluations were conducted in terms of total execution time, scalability and number of elements read, which indicates how many nodes must be read in a matching process. On the other hand, eXTJSwift is in terms of total execution time and number of paths matched and further to add various query criteria to compare the precise simulation in query diversification, target hierarchical level and the problem of ambiguity. Experiment of TJSwift results show that not only are these algorithms able to satisfy a query, but also has better time-saving efficiency compared with the existing twig-join algorithms such as the TJFast algorithm. Similarly, eXTJSwift achieved better accuracy than other approaches in terms of query diversification, target hierarchical level and the problem of ambiguity. Chungnan Lee 李宗南 2016 學位論文 ; thesis 75 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立中山大學 === 資訊工程學系研究所 === 104 === In recent years, XML (eXtensible Markup Language) has become the standard code for data representation and data exchange on the Internet. In addition, data queries have constantly increased; however, it is getting harder to query efficiently and obtain the precise required results because of the huge amount of data. The main operation in XML query processing is finding nodes that match the given query tree pattern (QTP) in the document. The problem is that accessing too many useless nodes in order to match a query pattern is very time consuming. Meanwhile, the XML documents based on characteristics of structure can lead to results which lack clarity due to the query ambiguity of the structure. Therefore, determining how to ensure efficient query service based on a skillful representation that can support query diversification and solve ambiguity in order to achieve high precision search capability is an important issue. To overcome the time-consumption problem, we utilize the structural summary tree (SST) algorithm to optimize XML documents; the aim is to eliminate unnecessary paths that include nested structures and duplicate paths. The novel twig-join Swift (TJSwift) associated with adjacent linked (AL) lists for the provision of efficient XML query services is proposed herein, whereby queries can be versatile in terms of predicates. It can completely preserve hierarchical information, and the new index generated from SST is used to save semantic information in order to provide template-based indexing for fast data searches. At the same time, to the best of our knowledge, researches on query diversification, queries in single node and hierarchical level difference and intermediate nodes with ambiguity in regard to hierarchical level are insufficient. In terms of result relevance, effectiveness is the most crucial aspect of query search, which can be summarized as these issues. Hence, we also further propose extending twig join Swift (eXTJSwift) associated with AL lists to provide efficient XML query services, whereby queries can be versatile in terms of predicates. In order to evaluate the performance of the TJSwift and eXTJSwift approaches with that of TwigStack, TwigList and TJFast, we conducted two sets of performance evaluation. For the TJSwift, the performance evaluations were conducted in terms of total execution time, scalability and number of elements read, which indicates how many nodes must be read in a matching process. On the other hand, eXTJSwift is in terms of total execution time and number of paths matched and further to add various query criteria to compare the precise simulation in query diversification, target hierarchical level and the problem of ambiguity. Experiment of TJSwift results show that not only are these algorithms able to satisfy a query, but also has better time-saving efficiency compared with the existing twig-join algorithms such as the TJFast algorithm. Similarly, eXTJSwift achieved better accuracy than other approaches in terms of query diversification, target hierarchical level and the problem of ambiguity.
author2 Chungnan Lee
author_facet Chungnan Lee
Yi-wei Kung
龔奕瑋
author Yi-wei Kung
龔奕瑋
spellingShingle Yi-wei Kung
龔奕瑋
The Novel Twig-Join Algorithm with Structure for Efficient Retrieval of XML Documents
author_sort Yi-wei Kung
title The Novel Twig-Join Algorithm with Structure for Efficient Retrieval of XML Documents
title_short The Novel Twig-Join Algorithm with Structure for Efficient Retrieval of XML Documents
title_full The Novel Twig-Join Algorithm with Structure for Efficient Retrieval of XML Documents
title_fullStr The Novel Twig-Join Algorithm with Structure for Efficient Retrieval of XML Documents
title_full_unstemmed The Novel Twig-Join Algorithm with Structure for Efficient Retrieval of XML Documents
title_sort novel twig-join algorithm with structure for efficient retrieval of xml documents
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/n986b7
work_keys_str_mv AT yiweikung thenoveltwigjoinalgorithmwithstructureforefficientretrievalofxmldocuments
AT gōngyìwěi thenoveltwigjoinalgorithmwithstructureforefficientretrievalofxmldocuments
AT yiweikung xīnshùzhīliánjiéyǎnsuànfǎjiéhéjiégòuduìxmlwénjiànyǒuxiàolǜdejiǎnsuǒ
AT gōngyìwěi xīnshùzhīliánjiéyǎnsuànfǎjiéhéjiégòuduìxmlwénjiànyǒuxiàolǜdejiǎnsuǒ
AT yiweikung noveltwigjoinalgorithmwithstructureforefficientretrievalofxmldocuments
AT gōngyìwěi noveltwigjoinalgorithmwithstructureforefficientretrievalofxmldocuments
_version_ 1719139738440433664