Semantically-Aware Data Discovery and Placement in Collaborative Computing Environments

<p align="justify">As the size of scientific datasets and the demand for interdisciplinary collaboration grow in modern science, it becomes imperative that better ways of discovering and placing datasets generated across multiple disciplines be developed to facilitate interdisciplina...

Full description

Bibliographic Details
Main Author: Wang, Xinqi
Other Authors: Adkins, William A.
Format: Others
Language:en
Published: LSU 2012
Subjects:
Online Access:http://etd.lsu.edu/docs/available/etd-03082012-073140/
id ndltd-LSU-oai-etd.lsu.edu-etd-03082012-073140
record_format oai_dc
spelling ndltd-LSU-oai-etd.lsu.edu-etd-03082012-0731402013-01-07T22:53:46Z Semantically-Aware Data Discovery and Placement in Collaborative Computing Environments Wang, Xinqi Computer Science <p align="justify">As the size of scientific datasets and the demand for interdisciplinary collaboration grow in modern science, it becomes imperative that better ways of discovering and placing datasets generated across multiple disciplines be developed to facilitate interdisciplinary scientific research.</p> <p align="justify">For discovering relevant data out of large-scale interdisciplinary datasets. The development and integration of cross-domain metadata is critical as metadata serves as the key guideline for organizing data. To develop and integrate cross-domain metadata management systems in interdisciplinary collaborative computing environment, three key issues need to be addressed: the development of a cross-domain metadata schema; the implementation of a metadata management system based on this schema; the integration of the metadata system into existing distributed computing infrastructure. </p> <p align="justify">Current research in metadata management in distributed computing environment largely focuses on relatively simple schema that lacks the underlying descriptive power to adequately address semantic heterogeneity often found in interdisciplinary science. And current work does not take adequate consideration the issue of scalability in large-scale data management.</p> <p align="justify">Another key issue in data management is data placement, due to the increasing size of scientific datasets, the overhead incurred as a result of transferring data among different nodes also grow into a significant inhibiting factor affecting overall performance. Currently, few data placement strategies take into consideration semantic information concerning data content. </p> <p align="justify">In this dissertation, we propose a cross-domain metadata system in a collaborative distributed computing environment and identify and evaluate key factors and processes involved in a successful cross-domain metadata system with the goal of facilitating data discovery in collaborative environments. This will allow researchers/users to conduct interdisciplinary science in the context of large-scale datasets that will make it easier to access interdisciplinary datasets, reduce barrier to collaboration, reduce cost of future development of similar systems.</p> <p align="justify">We also investigate data placement strategies that involve semantic information about the hardware and network environment as well as domain information in the form of semantic metadata so that semantic locality could be utilized in data placement, that could potentially reduce overhead for accessing large-scale interdisciplinary datasets.</p> Adkins, William A. Busch, Konstantin Allen, Gabrielle Kosar, Tevfik Chen, Jianhua LSU 2012-03-15 text application/pdf http://etd.lsu.edu/docs/available/etd-03082012-073140/ http://etd.lsu.edu/docs/available/etd-03082012-073140/ en unrestricted I hereby certify that, if appropriate, I have obtained and attached herein a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to LSU or its agents the non-exclusive license to archive and make accessible, under the conditions specified below and in appropriate University policies, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.
collection NDLTD
language en
format Others
sources NDLTD
topic Computer Science
spellingShingle Computer Science
Wang, Xinqi
Semantically-Aware Data Discovery and Placement in Collaborative Computing Environments
description <p align="justify">As the size of scientific datasets and the demand for interdisciplinary collaboration grow in modern science, it becomes imperative that better ways of discovering and placing datasets generated across multiple disciplines be developed to facilitate interdisciplinary scientific research.</p> <p align="justify">For discovering relevant data out of large-scale interdisciplinary datasets. The development and integration of cross-domain metadata is critical as metadata serves as the key guideline for organizing data. To develop and integrate cross-domain metadata management systems in interdisciplinary collaborative computing environment, three key issues need to be addressed: the development of a cross-domain metadata schema; the implementation of a metadata management system based on this schema; the integration of the metadata system into existing distributed computing infrastructure. </p> <p align="justify">Current research in metadata management in distributed computing environment largely focuses on relatively simple schema that lacks the underlying descriptive power to adequately address semantic heterogeneity often found in interdisciplinary science. And current work does not take adequate consideration the issue of scalability in large-scale data management.</p> <p align="justify">Another key issue in data management is data placement, due to the increasing size of scientific datasets, the overhead incurred as a result of transferring data among different nodes also grow into a significant inhibiting factor affecting overall performance. Currently, few data placement strategies take into consideration semantic information concerning data content. </p> <p align="justify">In this dissertation, we propose a cross-domain metadata system in a collaborative distributed computing environment and identify and evaluate key factors and processes involved in a successful cross-domain metadata system with the goal of facilitating data discovery in collaborative environments. This will allow researchers/users to conduct interdisciplinary science in the context of large-scale datasets that will make it easier to access interdisciplinary datasets, reduce barrier to collaboration, reduce cost of future development of similar systems.</p> <p align="justify">We also investigate data placement strategies that involve semantic information about the hardware and network environment as well as domain information in the form of semantic metadata so that semantic locality could be utilized in data placement, that could potentially reduce overhead for accessing large-scale interdisciplinary datasets.</p>
author2 Adkins, William A.
author_facet Adkins, William A.
Wang, Xinqi
author Wang, Xinqi
author_sort Wang, Xinqi
title Semantically-Aware Data Discovery and Placement in Collaborative Computing Environments
title_short Semantically-Aware Data Discovery and Placement in Collaborative Computing Environments
title_full Semantically-Aware Data Discovery and Placement in Collaborative Computing Environments
title_fullStr Semantically-Aware Data Discovery and Placement in Collaborative Computing Environments
title_full_unstemmed Semantically-Aware Data Discovery and Placement in Collaborative Computing Environments
title_sort semantically-aware data discovery and placement in collaborative computing environments
publisher LSU
publishDate 2012
url http://etd.lsu.edu/docs/available/etd-03082012-073140/
work_keys_str_mv AT wangxinqi semanticallyawaredatadiscoveryandplacementincollaborativecomputingenvironments
_version_ 1716477965841203200