Hybrid Keyword Search Across Peer-to-Peer Federated Data
The Internet provides a general communication environment for distributed resource sharing. XML has become a key technology for information representation and exchange on the Internet, increasing the opportunity for integration of the various data formats. The World Wide Web (WWW) is the example par...
Other Authors: | |
---|---|
Format: | Others |
Language: | English English |
Published: |
Florida State University
|
Subjects: | |
Online Access: | http://purl.flvc.org/fsu/fd/FSU_migr_etd-3052 |
id |
ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_181254 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_1812542020-06-10T03:07:52Z Hybrid Keyword Search Across Peer-to-Peer Federated Data Kim, Jungkee (authoraut) Riccardi, Gregory (professor co-directing dissertation) Fox, Geoffrey C. (professor co-directing dissertation) Dennis, Lawrence (outside committee member) Erlebacher, Gordon (committee member) Whalley, David (committee member) Department of Computer Science (degree granting department) Florida State University (degree granting institution) Text text Florida State University Florida State University English eng 1 online resource computer application/pdf The Internet provides a general communication environment for distributed resource sharing. XML has become a key technology for information representation and exchange on the Internet, increasing the opportunity for integration of the various data formats. The World Wide Web (WWW) is the example par excellence of a document-based distributed system on the Internet. As the size of the Web has increased, various problems with looking up a resource location on the Internet have emerged. Web search engines provide clues for resource location, but they have no semantic schema and often produce meaningless keyword search results. The Semantic Web suggests an alternative solution for the semantic problem on the Web. It provides multiple relation links with directed labeled graphs, and machines like Web crawlers can understand the relationship between different resources. But due to the need for sophisticated domain description and lack of unified definitions, many Web pages are not part of the Semantic Web. Meanwhile, recent public attention to peer-to-peer (P2P) networks has stimulated research on overlay P2P networks on top of the Internet. Those studies open possibilities for another form of distributed resource sharing on the Internet. In this dissertation we describe the design of a hybrid search that combines metadata search with a traditional keyword search over unstructured context data. This hybrid search paradigm provides the inquirer additional options to narrow the search with some semantic aspects through the XML metadata query. We tackle the scalability limitations of a single-machine implementation by adopting a distributed architecture. This scalable hybrid search provides a total query result from the collection of individual inquiries against independent data fragments distributed in a computer cluster. We demonstrate our architecture extends the scalability of a native XML query limited in a single machine and improves the performance of queries. Finally we generalize our hybrid architecture to more scalable searches over a P2P overlay network. This generalization may give an intermediate search paradigm on the Internet---providing semantic value through XML metadata that are simpler than those of the Semantic Web. A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Spring Semester, 2005. April 6, 2005. Keyword Search, Data Integration, Peer-To-Peer, Information Retrieval Includes bibliographical references. Gregory Riccardi, Professor Co-Directing Dissertation; Geoffrey C. Fox, Professor Co-Directing Dissertation; Lawrence Dennis, Outside Committee Member; Gordon Erlebacher, Committee Member; David Whalley, Committee Member. Computer science FSU_migr_etd-3052 http://purl.flvc.org/fsu/fd/FSU_migr_etd-3052 This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them. http://diginole.lib.fsu.edu/islandora/object/fsu%3A181254/datastream/TN/view/Hybrid%20Keyword%20Search%20Across%20Peer-to-Peer%20Federated%20Data.jpg |
collection |
NDLTD |
language |
English English |
format |
Others
|
sources |
NDLTD |
topic |
Computer science |
spellingShingle |
Computer science Hybrid Keyword Search Across Peer-to-Peer Federated Data |
description |
The Internet provides a general communication environment for distributed resource sharing. XML has become a key technology for information representation and exchange on the Internet, increasing the opportunity for integration of the various data formats. The World Wide Web (WWW) is the example par excellence of a document-based distributed system on the Internet. As the size of the Web has increased, various problems with looking up a resource location on the Internet have emerged. Web search engines provide clues for resource location, but they have no semantic schema and often produce meaningless keyword search results. The Semantic Web suggests an alternative solution for the semantic problem on the Web. It provides multiple relation links with directed labeled graphs, and machines like Web crawlers can understand the relationship between different resources. But due to the need for sophisticated domain description and lack of unified definitions, many Web pages are not part of the Semantic Web. Meanwhile, recent public attention to peer-to-peer (P2P) networks has stimulated research on overlay P2P networks on top of the Internet. Those studies open possibilities for another form of distributed resource sharing on the Internet. In this dissertation we describe the design of a hybrid search that combines metadata search with a traditional keyword search over unstructured context data. This hybrid search paradigm provides the inquirer additional options to narrow the search with some semantic aspects through the XML metadata query. We tackle the scalability limitations of a single-machine implementation by adopting a distributed architecture. This scalable hybrid search provides a total query result from the collection of individual inquiries against independent data fragments distributed in a computer cluster. We demonstrate our architecture extends the scalability of a native XML query limited in a single machine and improves the performance of queries. Finally we generalize our hybrid architecture to more scalable searches over a P2P overlay network. This generalization may give an intermediate search paradigm on the Internet---providing semantic value through XML metadata that are simpler than those of the Semantic Web. === A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy. === Spring Semester, 2005. === April 6, 2005. === Keyword Search, Data Integration, Peer-To-Peer, Information Retrieval === Includes bibliographical references. === Gregory Riccardi, Professor Co-Directing Dissertation; Geoffrey C. Fox, Professor Co-Directing Dissertation; Lawrence Dennis, Outside Committee Member; Gordon Erlebacher, Committee Member; David Whalley, Committee Member. |
author2 |
Kim, Jungkee (authoraut) |
author_facet |
Kim, Jungkee (authoraut) |
title |
Hybrid Keyword Search Across Peer-to-Peer Federated Data |
title_short |
Hybrid Keyword Search Across Peer-to-Peer Federated Data |
title_full |
Hybrid Keyword Search Across Peer-to-Peer Federated Data |
title_fullStr |
Hybrid Keyword Search Across Peer-to-Peer Federated Data |
title_full_unstemmed |
Hybrid Keyword Search Across Peer-to-Peer Federated Data |
title_sort |
hybrid keyword search across peer-to-peer federated data |
publisher |
Florida State University |
url |
http://purl.flvc.org/fsu/fd/FSU_migr_etd-3052 |
_version_ |
1719318539300503552 |