M*Ctree: A Multi-Resolution Indexing Structure for XML Data

XML has emerged as a universal data exchange format for disseminating and sharing information, particularly on the World Wide Web. As more XML data are generated, stored, and exchanged, the need to index XML data efficiently for querying purposes is becoming increasingly important. Designing efficie...

Full description

Bibliographic Details
Main Author: Guruvadoo, Eranna K.
Published: NSUWorks 2007
Subjects:
Online Access:http://nsuworks.nova.edu/gscis_etd/559
id ndltd-nova.edu-oai-nsuworks.nova.edu-gscis_etd-1558
record_format oai_dc
spelling ndltd-nova.edu-oai-nsuworks.nova.edu-gscis_etd-15582016-04-25T19:40:51Z M*Ctree: A Multi-Resolution Indexing Structure for XML Data Guruvadoo, Eranna K. XML has emerged as a universal data exchange format for disseminating and sharing information, particularly on the World Wide Web. As more XML data are generated, stored, and exchanged, the need to index XML data efficiently for querying purposes is becoming increasingly important. Designing efficient indexing structures for XML data presents serious challenges because any indexing scheme must index the structural a well as the data components of XML documents and provide tight integrations of the two components. This thesis studies XML indexing methods for tree-structured XML documents that can be queried by a subset of XPath expressions. More specifically, this thesis proposes a new main memory index structure, named M*Ctree, which is an enhanced Ctree. Unlike the Ctree which is constructed solely from the structural and data characteristics of the database, the M*Ctree includes query workload characteristics in order to speedup query evaluations on frequently used paths and nodes on the index structure. The Ctree index cleverly uses arrays to preserve child-parent relationships among individual data node pairs in a summary tree structure in order to avoid expensive structural join costs. The M*Ctree combines the use of child-parent links with additional arrays which provide child-ancestor links along frequently used paths to accelerate query evaluations. These child-ancestor links can be pre-computed based on query workloads, added, and removed as needed to reflect changing workload characteristics. Combined with value indexes which are structure-and-content sensitive, the M*Ctree becomes a multi-resolution index structure optimized for frequently used paths and achieves better overall performance than those index structures which do not consider query workloads. The M*Ctree trades off extra memory costs to support additional arrays and achieves better execution times for queries along frequently used paths of the index structure. Experiments conducted in this research show that the M*Ctree achieves better performance than the Ctree for both simple and branching path queries matching index paths with child-ancestor links. The M*Ctree achieves larger performance gain over the Ctree on index paths which do not contain regular groups. 2007-01-01T08:00:00Z text http://nsuworks.nova.edu/gscis_etd/559 CEC Theses and Dissertations NSUWorks Computer Sciences
collection NDLTD
sources NDLTD
topic Computer Sciences
spellingShingle Computer Sciences
Guruvadoo, Eranna K.
M*Ctree: A Multi-Resolution Indexing Structure for XML Data
description XML has emerged as a universal data exchange format for disseminating and sharing information, particularly on the World Wide Web. As more XML data are generated, stored, and exchanged, the need to index XML data efficiently for querying purposes is becoming increasingly important. Designing efficient indexing structures for XML data presents serious challenges because any indexing scheme must index the structural a well as the data components of XML documents and provide tight integrations of the two components. This thesis studies XML indexing methods for tree-structured XML documents that can be queried by a subset of XPath expressions. More specifically, this thesis proposes a new main memory index structure, named M*Ctree, which is an enhanced Ctree. Unlike the Ctree which is constructed solely from the structural and data characteristics of the database, the M*Ctree includes query workload characteristics in order to speedup query evaluations on frequently used paths and nodes on the index structure. The Ctree index cleverly uses arrays to preserve child-parent relationships among individual data node pairs in a summary tree structure in order to avoid expensive structural join costs. The M*Ctree combines the use of child-parent links with additional arrays which provide child-ancestor links along frequently used paths to accelerate query evaluations. These child-ancestor links can be pre-computed based on query workloads, added, and removed as needed to reflect changing workload characteristics. Combined with value indexes which are structure-and-content sensitive, the M*Ctree becomes a multi-resolution index structure optimized for frequently used paths and achieves better overall performance than those index structures which do not consider query workloads. The M*Ctree trades off extra memory costs to support additional arrays and achieves better execution times for queries along frequently used paths of the index structure. Experiments conducted in this research show that the M*Ctree achieves better performance than the Ctree for both simple and branching path queries matching index paths with child-ancestor links. The M*Ctree achieves larger performance gain over the Ctree on index paths which do not contain regular groups.
author Guruvadoo, Eranna K.
author_facet Guruvadoo, Eranna K.
author_sort Guruvadoo, Eranna K.
title M*Ctree: A Multi-Resolution Indexing Structure for XML Data
title_short M*Ctree: A Multi-Resolution Indexing Structure for XML Data
title_full M*Ctree: A Multi-Resolution Indexing Structure for XML Data
title_fullStr M*Ctree: A Multi-Resolution Indexing Structure for XML Data
title_full_unstemmed M*Ctree: A Multi-Resolution Indexing Structure for XML Data
title_sort m*ctree: a multi-resolution indexing structure for xml data
publisher NSUWorks
publishDate 2007
url http://nsuworks.nova.edu/gscis_etd/559
work_keys_str_mv AT guruvadooerannak mctreeamultiresolutionindexingstructureforxmldata
_version_ 1718248548718346240