Summary: | 碩士 === 國立中山大學 === 資訊工程學系研究所 === 90 ===
In recent years, many people use the World Wide Web and Internet
to find information that they want. HTML is a document markup
language for publishing hypertext on the WWW. HTML has been the
target format for content developers around the world. Basically,
HTML tags serve the primary purpose of describing how to display a
data item. Therefore, HTML documents are difficult to find some
useful information. That is because, HTML documents are mixed
content with display tags. On the other hand, XML is the another
data format for data exchange inter-enterprise applications on the
Internet. In order to facilitate data exchange, industry groups
define public Document Type Definitions (DTD) that specify the
format of the XML documents to be exchanged between their
applications. Moreover, WWW/EDI or Electric Commerce is very
popular and a lot of business data uses XML to exchange
information on the World Wide Web. Basically, XML tags describe
the data itself. The contents (meaning) of the XML documents and
the display format is separated. It could be easily to find
meaningful information of the XML documents and analyze the
information. Moreover, when a large volume of business data (XML
documents) exists, one way to support the management of the XML
documents is to apply the relational databases. For such an
approach, we must transform the XML documents to the relational
databases. In this thesis, we design and implement the indexing
strategies to efficiently access XML documents. XML document is
fundamentally different from relational data. XML is a
hierarchical and nested document, it is very similar to the
semistructured data model. The characteristic of semistructured
data is that it may not have a fixed schema and it may be
irregular or incomplete. Though, the semistructured data model is
flexible in data modeling, it requires a large search space in
query processing since there is no schema fixed in advance.
Indexing is the way of how to improve query performance
efficiently. However, due to the special properties of
semistructued data, there are up to five types of queries: (1)
complete single path, (2) specified leaf only, (3) specified
intrapath, (4) specified attribute/element(value), and (5)
multiple paths with the same level. In this thesis, we classify
all possible queries into those five query types. Next, we create
different indexes for different query types. Moreover, we design
and implement the query transformation from XML query statements
to SQL statements. Also, we create a user-friendly interface for
users to input XML query statements. The whole system is
implemented in JAVA and SQL Server 2000. From our experiences, we
show that our indexing strategies can improve the XML query
processing performance very well.
|