The Novel Twig-Join Algorithm with Structure for Efficient Retrieval of XML Documents

博士 === 國立中山大學 === 資訊工程學系研究所 === 104 === In recent years, XML (eXtensible Markup Language) has become the standard code for data representation and data exchange on the Internet. In addition, data queries have constantly increased; however, it is getting harder to query efficiently and obtain the pre...

Full description

Bibliographic Details
Main Authors: Yi-wei Kung, 龔奕瑋
Other Authors: Chungnan Lee
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/n986b7
Description
Summary:博士 === 國立中山大學 === 資訊工程學系研究所 === 104 === In recent years, XML (eXtensible Markup Language) has become the standard code for data representation and data exchange on the Internet. In addition, data queries have constantly increased; however, it is getting harder to query efficiently and obtain the precise required results because of the huge amount of data. The main operation in XML query processing is finding nodes that match the given query tree pattern (QTP) in the document. The problem is that accessing too many useless nodes in order to match a query pattern is very time consuming. Meanwhile, the XML documents based on characteristics of structure can lead to results which lack clarity due to the query ambiguity of the structure. Therefore, determining how to ensure efficient query service based on a skillful representation that can support query diversification and solve ambiguity in order to achieve high precision search capability is an important issue. To overcome the time-consumption problem, we utilize the structural summary tree (SST) algorithm to optimize XML documents; the aim is to eliminate unnecessary paths that include nested structures and duplicate paths. The novel twig-join Swift (TJSwift) associated with adjacent linked (AL) lists for the provision of efficient XML query services is proposed herein, whereby queries can be versatile in terms of predicates. It can completely preserve hierarchical information, and the new index generated from SST is used to save semantic information in order to provide template-based indexing for fast data searches. At the same time, to the best of our knowledge, researches on query diversification, queries in single node and hierarchical level difference and intermediate nodes with ambiguity in regard to hierarchical level are insufficient. In terms of result relevance, effectiveness is the most crucial aspect of query search, which can be summarized as these issues. Hence, we also further propose extending twig join Swift (eXTJSwift) associated with AL lists to provide efficient XML query services, whereby queries can be versatile in terms of predicates. In order to evaluate the performance of the TJSwift and eXTJSwift approaches with that of TwigStack, TwigList and TJFast, we conducted two sets of performance evaluation. For the TJSwift, the performance evaluations were conducted in terms of total execution time, scalability and number of elements read, which indicates how many nodes must be read in a matching process. On the other hand, eXTJSwift is in terms of total execution time and number of paths matched and further to add various query criteria to compare the precise simulation in query diversification, target hierarchical level and the problem of ambiguity. Experiment of TJSwift results show that not only are these algorithms able to satisfy a query, but also has better time-saving efficiency compared with the existing twig-join algorithms such as the TJFast algorithm. Similarly, eXTJSwift achieved better accuracy than other approaches in terms of query diversification, target hierarchical level and the problem of ambiguity.