Property-Based Semantic Similarity Criteria to Evaluate the Overlaps of Schemas

Knowledge graph-based data integration is a practical methodology for heterogeneous legacy database-integrated service construction. However, it is neither efficient nor economical to build a new cross-domain knowledge graph on top of the schemas of each legacy database for the specific integration...

Full description

Bibliographic Details
Main Authors: Lan Huang, Yuanwei Zhao, Bo Wang, Dongxu Zhang, Rui Zhang, Subhashis Das, Simone Bocca, Fausto Giunchiglia
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/14/8/241
Description
Summary:Knowledge graph-based data integration is a practical methodology for heterogeneous legacy database-integrated service construction. However, it is neither efficient nor economical to build a new cross-domain knowledge graph on top of the schemas of each legacy database for the specific integration application rather than reusing the existing high-quality knowledge graphs. Consequently, a question arises as to whether the existing knowledge graph is compatible with cross-domain queries and with heterogenous schemas of the legacy systems. An effective criterion is urgently needed in order to evaluate such compatibility as it limits the quality upbound of the integration. This research studies the semantic similarity of the schemas from the aspect of properties. It provides a set of in-depth criteria, namely coverage and flexibility, to evaluate the pairwise compatibility between the schemas. It takes advantage of the properties of knowledge graphs to evaluate the overlaps between schemas and defines the weights of entity types in order to perform precise compatibility computation. The effectiveness of the criteria obtained to evaluate the compatibility between knowledge graphs and cross-domain queries is demonstrated using a case study.
ISSN:1999-4893