A Novel Data Integration Framework Based on Unified Concept Model

Nowadays, data is being generated, collected, and analyzed at an unprecedented scale, data integration is the problem of combining data from heterogeneous, autonomous data sources, and providing users with a unified view of integrated data. To design a data integration framework, we need to address...

Full description

Bibliographic Details
Main Authors: Bo Ma, Tonghai Jiang, Xi Zhou, Fan Zhao, Yating Yang
Format: Article
Language:English
Published: IEEE 2017-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/7862127/
Description
Summary:Nowadays, data is being generated, collected, and analyzed at an unprecedented scale, data integration is the problem of combining data from heterogeneous, autonomous data sources, and providing users with a unified view of integrated data. To design a data integration framework, we need to address challenges, such as schema mapping, data cleaning, record linkage, and data fusion. In this paper, we briefly introduce the traditional data integration approaches, and then, a novel graph-based data integration framework based on unified concept model (UCM) is proposed to address real-world refueling data integration problems. Within this framework, schema mapping was carried out and metadata from heterogeneous sources is integrated in a UCM. UCM has the benefits of being easy to update. It is also important for effective schema mapping and data transformation. By following the structure of UCM, data from different sources is automatically transformed into instance data and linked together by using semantic similarity computation metrics, finally the data is stored in graph database. Experiments are carried out based on heterogeneous data from refueling records, social networks of astroturfers, and vehicle trajectories. Experimental results and reference implementation demonstrations show good precision and recall of the proposed framework.
ISSN:2169-3536