Summary: | Nowadays, data is being generated, collected, and analyzed at an unprecedented scale, data integration is the problem of combining data from heterogeneous, autonomous data sources, and providing users with a unified view of integrated data. To design a data integration framework, we need to address challenges, such as schema mapping, data cleaning, record linkage, and data fusion. In this paper, we briefly introduce the traditional data integration approaches, and then, a novel graph-based data integration framework based on unified concept model (UCM) is proposed to address real-world refueling data integration problems. Within this framework, schema mapping was carried out and metadata from heterogeneous sources is integrated in a UCM. UCM has the benefits of being easy to update. It is also important for effective schema mapping and data transformation. By following the structure of UCM, data from different sources is automatically transformed into instance data and linked together by using semantic similarity computation metrics, finally the data is stored in graph database. Experiments are carried out based on heterogeneous data from refueling records, social networks of astroturfers, and vehicle trajectories. Experimental results and reference implementation demonstrations show good precision and recall of the proposed framework.
|