The Architecture of Farsi Knowledge Graph System
The knowledge graph plays an important role in the Semantic Web and Natural Language Processing (NLP) tools. There are many knowledge bases in different languages, however lack of Farsi-specific knowledge base appears some defects in research and industrial applications. In this study, the most comp...
Main Authors: | , |
---|---|
Format: | Article |
Language: | fas |
Published: |
Iranian Research Institute for Information and Technology
2020-03-01
|
Series: | Iranian Journal of Information Processing & Management |
Subjects: | |
Online Access: | http://jipm.irandoc.ac.ir/article-1-4190-en.html |
id |
doaj-63b3b53b5506412c8b8472ac1d273311 |
---|---|
record_format |
Article |
spelling |
doaj-63b3b53b5506412c8b8472ac1d2733112020-11-25T02:06:28ZfasIranian Research Institute for Information and TechnologyIranian Journal of Information Processing & Management2251-82232251-82312020-03-01352425462The Architecture of Farsi Knowledge Graph SystemMohamad Bagher Sajadi0Behrouz Minaei Bidgoli1 Department of Computer Engineering, University of Science and Technology, Tehran, Iran. Department of Computer Engineering, University of Science and Technology, Tehran, Iran. The knowledge graph plays an important role in the Semantic Web and Natural Language Processing (NLP) tools. There are many knowledge bases in different languages, however lack of Farsi-specific knowledge base appears some defects in research and industrial applications. In this study, the most comprehensive knowledge base in Farsi language is presented, which consists of more than 500K of entities and 7 million relations, which is accessible open source. Data is supplied 3 sources: Farsi Wikipedia and its structured data such as infobox, Web tables, relation extraction module. According to the semantic web, RDF data model and OWL2 ontology employed to implement the Farsi Knowledge Graph (FKG). Resources and their relations are stored in triple format, therefor access to the knowledge graph is provided by a SPARQL endpoint. An ontology, retrieved from DBpedia ontology, was developed and improved Based on resources of Farsi Wikipedia. Also, more than 8000 templates and properties of Wikipedia were mapped to the ontology automatically and manually. Furthermore, a part of the ontology was mapped to the FarsNet, the Persian WordNet, for research purposes. In the graph, there are a large amount of information on a variety of topics including famous people, important places, organizations and companies, literary and art works, physiology, biology, events, species, astronomy, etc. According to the Linked data, most of entities in the FKG have been connected to DBpedia and Wikidata resources by owl:sameAs. In order to achieve high performance and flexible data model, a two-level architecture for storing data was designed to separate data from metadata. This design plays a key role in update operation and managing versions. For evaluation purposes, a small part of triples were randomly collected to build a test dataset for manually inspection. Experimental results demonstrate that more than 94% of triples were obtained correctly through the process of extraction, conversion, mapping, transformation and store. Future of internet according to the semantic web will be a complex and huge global knowledge base, therefor the FKG can play a significant role in defining and developing this emerging technology.http://jipm.irandoc.ac.ir/article-1-4190-en.htmlknowledge baserdfsemantic webfarsi languagelinked data. |
collection |
DOAJ |
language |
fas |
format |
Article |
sources |
DOAJ |
author |
Mohamad Bagher Sajadi Behrouz Minaei Bidgoli |
spellingShingle |
Mohamad Bagher Sajadi Behrouz Minaei Bidgoli The Architecture of Farsi Knowledge Graph System Iranian Journal of Information Processing & Management knowledge base rdf semantic web farsi language linked data. |
author_facet |
Mohamad Bagher Sajadi Behrouz Minaei Bidgoli |
author_sort |
Mohamad Bagher Sajadi |
title |
The Architecture of Farsi Knowledge Graph System |
title_short |
The Architecture of Farsi Knowledge Graph System |
title_full |
The Architecture of Farsi Knowledge Graph System |
title_fullStr |
The Architecture of Farsi Knowledge Graph System |
title_full_unstemmed |
The Architecture of Farsi Knowledge Graph System |
title_sort |
architecture of farsi knowledge graph system |
publisher |
Iranian Research Institute for Information and Technology |
series |
Iranian Journal of Information Processing & Management |
issn |
2251-8223 2251-8231 |
publishDate |
2020-03-01 |
description |
The knowledge graph plays an important role in the Semantic Web and Natural Language Processing (NLP) tools. There are many knowledge bases in different languages, however lack of Farsi-specific knowledge base appears some defects in research and industrial applications. In this study, the most comprehensive knowledge base in Farsi language is presented, which consists of more than 500K of entities and 7 million relations, which is accessible open source. Data is supplied 3 sources: Farsi Wikipedia and its structured data such as infobox, Web tables, relation extraction module. According to the semantic web, RDF data model and OWL2 ontology employed to implement the Farsi Knowledge Graph (FKG). Resources and their relations are stored in triple format, therefor access to the knowledge graph is provided by a SPARQL endpoint. An ontology, retrieved from DBpedia ontology, was developed and improved Based on resources of Farsi Wikipedia. Also, more than 8000 templates and properties of Wikipedia were mapped to the ontology automatically and manually. Furthermore, a part of the ontology was mapped to the FarsNet, the Persian WordNet, for research purposes. In the graph, there are a large amount of information on a variety of topics including famous people, important places, organizations and companies, literary and art works, physiology, biology, events, species, astronomy, etc. According to the Linked data, most of entities in the FKG have been connected to DBpedia and Wikidata resources by owl:sameAs. In order to achieve high performance and flexible data model, a two-level architecture for storing data was designed to separate data from metadata. This design plays a key role in update operation and managing versions. For evaluation purposes, a small part of triples were randomly collected to build a test dataset for manually inspection. Experimental results demonstrate that more than 94% of triples were obtained correctly through the process of extraction, conversion, mapping, transformation and store. Future of internet according to the semantic web will be a complex and huge global knowledge base, therefor the FKG can play a significant role in defining and developing this emerging technology. |
topic |
knowledge base rdf semantic web farsi language linked data. |
url |
http://jipm.irandoc.ac.ir/article-1-4190-en.html |
work_keys_str_mv |
AT mohamadbaghersajadi thearchitectureoffarsiknowledgegraphsystem AT behrouzminaeibidgoli thearchitectureoffarsiknowledgegraphsystem AT mohamadbaghersajadi architectureoffarsiknowledgegraphsystem AT behrouzminaeibidgoli architectureoffarsiknowledgegraphsystem |
_version_ |
1724933741962330112 |