Supporting domain heterogeneous data sources for semantic integration
A SEMantic Integration System (SemIS) allows a query over one database to be answered using the knowledge managed in multiple databases in the system. It does so by translating a query across the collaborative databases in which data is autonomously managed in heterogeneous schemas. In this thesis,...
Main Author: | |
---|---|
Language: | English |
Published: |
University of British Columbia
2011
|
Online Access: | http://hdl.handle.net/2429/36583 |
id |
ndltd-UBC-oai-circle.library.ubc.ca-2429-36583 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UBC-oai-circle.library.ubc.ca-2429-365832018-01-05T17:25:11Z Supporting domain heterogeneous data sources for semantic integration Xu, Jian A SEMantic Integration System (SemIS) allows a query over one database to be answered using the knowledge managed in multiple databases in the system. It does so by translating a query across the collaborative databases in which data is autonomously managed in heterogeneous schemas. In this thesis, we investigate the challenges that arise in enabling domain heterogeneous (DH) databases to collaborate in a SemIS. In such a setting, distributed databases modeled as independent data sources are pairwise mapped to form the semantic overlay network (SON) of the SemIS. We study two problems we believe are foremost to allow a SemIS to integrate DH data sources. The first problem tackled in this thesis is to efficiently organize data sources so that query answering is efficient despite the increased level of source heterogeneity. This problem is modeled as an “Acquaintance Selection” problem and our solution helps data sources to choose appropriate acquaintances to create schema mappings with and therefore allows a SemIS to have a single-layered and flexible SON. The second problem tackled in this thesis is to allow aggregate queries to be translated across domain heterogeneous (DH) data sources where objects are usually represented and managed at different granularity. We focus our study on relational databases and propose novel techniques that allow a (non-aggregate) query to be answered by aggregations over objects at a finer granularity. The new query answering framework, named “decomposition aggregation query (DAQ)” processing, integrates data sources holding information in different domains and different granularity. New challenges are identified and tackled in a systematic way. We studied query optimizations for DAQ to provide efficient and scalable query processing. The solutions for both problems are evaluated empirically using real-life data and synthetic data sets. The empirical studies verified our theoretical claims and showed the feasibility, applicability (for real-life applications) and scalability of the techniques and solutions. Science, Faculty of Computer Science, Department of Graduate 2011-08-08T16:58:21Z 2011-08-08T16:58:21Z 2011 2011-11 Text Thesis/Dissertation http://hdl.handle.net/2429/36583 eng Attribution-NonCommercial-ShareAlike 3.0 Unported http://creativecommons.org/licenses/by-nc-sa/3.0/ University of British Columbia |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
description |
A SEMantic Integration System (SemIS) allows a query over one database to be answered using the knowledge managed in multiple databases in the system. It does so by translating a query across the collaborative databases in which data is autonomously managed in heterogeneous schemas. In this thesis, we investigate the challenges that arise in enabling domain heterogeneous (DH) databases to collaborate in a SemIS. In such a setting, distributed databases modeled as independent data sources are pairwise mapped to form the semantic overlay network (SON) of the SemIS. We study two problems we believe are foremost to allow a SemIS to integrate DH data sources.
The first problem tackled in this thesis is to efficiently organize data sources so that query answering is efficient despite the increased level of source heterogeneity. This problem is modeled as an “Acquaintance Selection” problem and our solution helps data sources to choose appropriate acquaintances to create schema mappings with and therefore allows a SemIS to have a single-layered and flexible SON.
The second problem tackled in this thesis is to allow aggregate queries to be translated across domain heterogeneous (DH) data sources where objects are usually represented and managed at different granularity. We focus our study on relational databases and propose novel techniques that allow a (non-aggregate) query to be answered by aggregations over objects at a finer granularity. The new query answering framework, named “decomposition aggregation query (DAQ)” processing, integrates data sources holding information in different domains and different granularity. New challenges are identified and tackled in a systematic way. We studied query optimizations for DAQ to provide efficient and scalable query processing.
The solutions for both problems are evaluated empirically using real-life data and synthetic data sets. The empirical studies verified our theoretical claims and showed the feasibility, applicability (for real-life applications) and scalability of the techniques and solutions. === Science, Faculty of === Computer Science, Department of === Graduate |
author |
Xu, Jian |
spellingShingle |
Xu, Jian Supporting domain heterogeneous data sources for semantic integration |
author_facet |
Xu, Jian |
author_sort |
Xu, Jian |
title |
Supporting domain heterogeneous data sources for semantic integration |
title_short |
Supporting domain heterogeneous data sources for semantic integration |
title_full |
Supporting domain heterogeneous data sources for semantic integration |
title_fullStr |
Supporting domain heterogeneous data sources for semantic integration |
title_full_unstemmed |
Supporting domain heterogeneous data sources for semantic integration |
title_sort |
supporting domain heterogeneous data sources for semantic integration |
publisher |
University of British Columbia |
publishDate |
2011 |
url |
http://hdl.handle.net/2429/36583 |
work_keys_str_mv |
AT xujian supportingdomainheterogeneousdatasourcesforsemanticintegration |
_version_ |
1718582951937048576 |