Summary: | 碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 92 === Abstract
In this thesis, we discuss the issue of query processing against XML data. Due to the tree-structured characteristic of XML documents, the tree-structured query consists of the usual value constraint and the structural constraint. The structural constraint is based on paths, and the relationship between nodes in a path can be either parent-child or ancestor-descendant. Because the query includes two kinds of constraints, the final result must conform the two constrains at the same time. However, the structural constraint is more difficult to process than the value constraint, so processing the structural constraint is the key of XML query processing.
In this thesis, we proposed four special data structures. First, we transform the query which users input into the structure “QueryTree”. QueryTree records the structural and value constrains of the input query. Second, since the DTD records all possible paths in the XML document, we proposed the “EP-tree” based on DTD to efficiently resolve the parent-child constraint. Third, we transform every element in XML documents into an “EC-Tuple” by using interval-encoding, and classify the set of EC-Tuple into “EC-Table” by the corresponding path. By using interval-encoding, we can efficiently resolve the ancestor-descendant constraint. Finally, we proposed the “Value-Index”, it is based on B+tree, and is used to resolve the value constrains of the query.
Based on these data structures, we designed a set of algorithms to efficiently retrieve the result of a tree-structured query from an XML document. The basic idea of our approach is first to resolve the parent-child constraint, then resolve the ancestor-descendant constraint, and finally resolve the twig point.
|