Hybrid Keyword Search Across Peer-to-Peer Federated Data
The Internet provides a general communication environment for distributed resource sharing. XML has become a key technology for information representation and exchange on the Internet, increasing the opportunity for integration of the various data formats. The World Wide Web (WWW) is the example par...
Other Authors: | |
---|---|
Format: | Others |
Language: | English English |
Published: |
Florida State University
|
Subjects: | |
Online Access: | http://purl.flvc.org/fsu/fd/FSU_migr_etd-3052 |
Summary: | The Internet provides a general communication environment for distributed resource sharing. XML has become a key technology for information representation and exchange on the Internet, increasing the opportunity for integration of the various data formats. The World Wide Web (WWW) is the example par excellence of a document-based distributed system on the Internet. As the size of the Web has increased, various problems with looking up a resource location on the Internet have emerged. Web search engines provide clues for resource location, but they have no semantic schema and often produce meaningless keyword search results. The Semantic Web suggests an alternative solution for the semantic problem on the Web. It provides multiple relation links with directed labeled graphs, and machines like Web crawlers can understand the relationship between different resources. But due to the need for sophisticated domain description and lack of unified definitions, many Web pages are not part of the Semantic Web. Meanwhile, recent public attention to peer-to-peer (P2P) networks has stimulated research on overlay P2P networks on top of the Internet. Those studies open possibilities for another form of distributed resource sharing on the Internet. In this dissertation we describe the design of a hybrid search that combines metadata search with a traditional keyword search over unstructured context data. This hybrid search paradigm provides the inquirer additional options to narrow the search with some semantic aspects through the XML metadata query. We tackle the scalability limitations of a single-machine implementation by adopting a distributed architecture. This scalable hybrid search provides a total query result from the collection of individual inquiries against independent data fragments distributed in a computer cluster. We demonstrate our architecture extends the scalability of a native XML query limited in a single machine and improves the performance of queries. Finally we generalize our hybrid architecture to more scalable searches over a P2P overlay network. This generalization may give an intermediate search paradigm on the Internet---providing semantic value through XML metadata that are simpler than those of the Semantic Web. === A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy. === Spring Semester, 2005. === April 6, 2005. === Keyword Search, Data Integration, Peer-To-Peer, Information Retrieval === Includes bibliographical references. === Gregory Riccardi, Professor Co-Directing Dissertation; Geoffrey C. Fox, Professor Co-Directing Dissertation; Lawrence Dennis, Outside Committee Member; Gordon Erlebacher, Committee Member; David Whalley, Committee Member. |
---|