A practical approach to set orientated query execution in semistructured databases

The amount of semistructured data is growing rapidly as the World Wide Web has developed into a central means for sharing and disseminating information. The structure of tree-like semistructured data is not rigid. The most common instance of this type of data is XML. Applications endeavouring to acc...

Full description

Bibliographic Details
Main Author: Du, Chu-Ming
Published: University of Birmingham 2003
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.412622
Description
Summary:The amount of semistructured data is growing rapidly as the World Wide Web has developed into a central means for sharing and disseminating information. The structure of tree-like semistructured data is not rigid. The most common instance of this type of data is XML. Applications endeavouring to access components of semistructured data are naturally inclined towards a recursive approach to navigate data on trees. However, conventional wisdom indicates that a set-oriented mechanism is necessary for database management systems to obtain good performance in the presence of large amounts of data. Our main objective in this thesis is to develop a set-oriented query execution scheme for XML data. We propose a system, called "Equate" (Execution of Queries Using an Automata Theoretic Engine), which intelligently utilises an automata rewriting scheme to transform a query language into an internal query plan with relational-like operators scheduled in a single process for a set-oriented execution. Our approach contains two phases. The first phase, set-oriented execution, performs queries on edges and binds any variables required. The second phase, reachability analysis, refines the result, filtering out any false matches, and collects sets of variable bindings into a final result structure. " A novel aspect of our approach is that our set-oriented execution, even for complex queries, requires only variants of the relational select, project, and union operators, but no joins.