M*Ctree: A Multi-Resolution Indexing Structure for XML Data

XML has emerged as a universal data exchange format for disseminating and sharing information, particularly on the World Wide Web. As more XML data are generated, stored, and exchanged, the need to index XML data efficiently for querying purposes is becoming increasingly important. Designing efficie...

Full description

Bibliographic Details
Main Author:	Guruvadoo, Eranna K.
Published:	NSUWorks 2007
Subjects:	Computer Sciences
Online Access:	http://nsuworks.nova.edu/gscis_etd/559

Description
Summary:	XML has emerged as a universal data exchange format for disseminating and sharing information, particularly on the World Wide Web. As more XML data are generated, stored, and exchanged, the need to index XML data efficiently for querying purposes is becoming increasingly important. Designing efficient indexing structures for XML data presents serious challenges because any indexing scheme must index the structural a well as the data components of XML documents and provide tight integrations of the two components. This thesis studies XML indexing methods for tree-structured XML documents that can be queried by a subset of XPath expressions. More specifically, this thesis proposes a new main memory index structure, named MCtree, which is an enhanced Ctree. Unlike the Ctree which is constructed solely from the structural and data characteristics of the database, the MCtree includes query workload characteristics in order to speedup query evaluations on frequently used paths and nodes on the index structure. The Ctree index cleverly uses arrays to preserve child-parent relationships among individual data node pairs in a summary tree structure in order to avoid expensive structural join costs. The MCtree combines the use of child-parent links with additional arrays which provide child-ancestor links along frequently used paths to accelerate query evaluations. These child-ancestor links can be pre-computed based on query workloads, added, and removed as needed to reflect changing workload characteristics. Combined with value indexes which are structure-and-content sensitive, the MCtree becomes a multi-resolution index structure optimized for frequently used paths and achieves better overall performance than those index structures which do not consider query workloads. The MCtree trades off extra memory costs to support additional arrays and achieves better execution times for queries along frequently used paths of the index structure. Experiments conducted in this research show that the MCtree achieves better performance than the Ctree for both simple and branching path queries matching index paths with child-ancestor links. The M*Ctree achieves larger performance gain over the Ctree on index paths which do not contain regular groups.

M*Ctree: A Multi-Resolution Indexing Structure for XML Data

Similar Items