High Performance Methods for Linked Open Data Connectivity Analytics

The main objective of Linked Data is linking and integration, and a major step for evaluating whether this target has been reached, is to find all the connections among the Linked Open Data (LOD) Cloud datasets. Connectivity among two or more datasets can be achieved through common Entities, Triples...

Full description

Bibliographic Details
Main Authors: Michalis Mountantonakis, Yannis Tzitzikas
Format: Article
Language:English
Published: MDPI AG 2018-06-01
Series:Information
Subjects:
Online Access:http://www.mdpi.com/2078-2489/9/6/134
id doaj-6d135911e2cc4c7fbf99a047a8ad909e
record_format Article
spelling doaj-6d135911e2cc4c7fbf99a047a8ad909e2020-11-24T22:19:09ZengMDPI AGInformation2078-24892018-06-019613410.3390/info9060134info9060134High Performance Methods for Linked Open Data Connectivity AnalyticsMichalis Mountantonakis0Yannis Tzitzikas1Institute of Computer Science, FORTH-ICS, Heraklion 70013, GreeceInstitute of Computer Science, FORTH-ICS, Heraklion 70013, GreeceThe main objective of Linked Data is linking and integration, and a major step for evaluating whether this target has been reached, is to find all the connections among the Linked Open Data (LOD) Cloud datasets. Connectivity among two or more datasets can be achieved through common Entities, Triples, Literals, and Schema Elements, while more connections can occur due to equivalence relationships between URIs, such as owl:sameAs, owl:equivalentProperty and owl:equivalentClass, since many publishers use such equivalence relationships, for declaring that their URIs are equivalent with URIs of other datasets. However, there are not available connectivity measurements (and indexes) involving more than two datasets, that cover the whole content (e.g., entities, schema, triples) or “slices” (e.g., triples for a specific entity) of datasets, although they can be of primary importance for several real world tasks, such as Information Enrichment, Dataset Discovery and others. Generally, it is not an easy task to find the connections among the datasets, since there exists a big number of LOD datasets and the transitive and symmetric closure of equivalence relationships should be computed for not missing connections. For this reason, we introduce scalable methods and algorithms, (a) for performing the computation of transitive and symmetric closure for equivalence relationships (since they can produce more connections between the datasets); (b) for constructing dedicated global semantics-aware indexes that cover the whole content of datasets; and (c) for measuring the connectivity among two or more datasets. Finally, we evaluate the speedup of the proposed approach, while we report comparative results for over two billion triples.http://www.mdpi.com/2078-2489/9/6/134content-based connectivity measurementssemantic weblinked datadataset discoveryinformation enrichmentLOD scale analyticslattice of measurementsMapReducebig data
collection DOAJ
language English
format Article
sources DOAJ
author Michalis Mountantonakis
Yannis Tzitzikas
spellingShingle Michalis Mountantonakis
Yannis Tzitzikas
High Performance Methods for Linked Open Data Connectivity Analytics
Information
content-based connectivity measurements
semantic web
linked data
dataset discovery
information enrichment
LOD scale analytics
lattice of measurements
MapReduce
big data
author_facet Michalis Mountantonakis
Yannis Tzitzikas
author_sort Michalis Mountantonakis
title High Performance Methods for Linked Open Data Connectivity Analytics
title_short High Performance Methods for Linked Open Data Connectivity Analytics
title_full High Performance Methods for Linked Open Data Connectivity Analytics
title_fullStr High Performance Methods for Linked Open Data Connectivity Analytics
title_full_unstemmed High Performance Methods for Linked Open Data Connectivity Analytics
title_sort high performance methods for linked open data connectivity analytics
publisher MDPI AG
series Information
issn 2078-2489
publishDate 2018-06-01
description The main objective of Linked Data is linking and integration, and a major step for evaluating whether this target has been reached, is to find all the connections among the Linked Open Data (LOD) Cloud datasets. Connectivity among two or more datasets can be achieved through common Entities, Triples, Literals, and Schema Elements, while more connections can occur due to equivalence relationships between URIs, such as owl:sameAs, owl:equivalentProperty and owl:equivalentClass, since many publishers use such equivalence relationships, for declaring that their URIs are equivalent with URIs of other datasets. However, there are not available connectivity measurements (and indexes) involving more than two datasets, that cover the whole content (e.g., entities, schema, triples) or “slices” (e.g., triples for a specific entity) of datasets, although they can be of primary importance for several real world tasks, such as Information Enrichment, Dataset Discovery and others. Generally, it is not an easy task to find the connections among the datasets, since there exists a big number of LOD datasets and the transitive and symmetric closure of equivalence relationships should be computed for not missing connections. For this reason, we introduce scalable methods and algorithms, (a) for performing the computation of transitive and symmetric closure for equivalence relationships (since they can produce more connections between the datasets); (b) for constructing dedicated global semantics-aware indexes that cover the whole content of datasets; and (c) for measuring the connectivity among two or more datasets. Finally, we evaluate the speedup of the proposed approach, while we report comparative results for over two billion triples.
topic content-based connectivity measurements
semantic web
linked data
dataset discovery
information enrichment
LOD scale analytics
lattice of measurements
MapReduce
big data
url http://www.mdpi.com/2078-2489/9/6/134
work_keys_str_mv AT michalismountantonakis highperformancemethodsforlinkedopendataconnectivityanalytics
AT yannistzitzikas highperformancemethodsforlinkedopendataconnectivityanalytics
_version_ 1725779877207998464