The Architecture of Farsi Knowledge Graph System

The knowledge graph plays an important role in the Semantic Web and Natural Language Processing (NLP) tools. There are many knowledge bases in different languages, however lack of Farsi-specific knowledge base appears some defects in research and industrial applications. In this study, the most comp...

Full description

Bibliographic Details
Main Authors: Mohamad Bagher Sajadi, Behrouz Minaei Bidgoli
Format: Article
Language:fas
Published: Iranian Research Institute for Information and Technology 2020-03-01
Series:Iranian Journal of Information Processing & Management
Subjects:
rdf
Online Access:http://jipm.irandoc.ac.ir/article-1-4190-en.html
id doaj-63b3b53b5506412c8b8472ac1d273311
record_format Article
spelling doaj-63b3b53b5506412c8b8472ac1d2733112020-11-25T02:06:28ZfasIranian Research Institute for Information and TechnologyIranian Journal of Information Processing & Management2251-82232251-82312020-03-01352425462The Architecture of Farsi Knowledge Graph SystemMohamad Bagher Sajadi0Behrouz Minaei Bidgoli1 Department of Computer Engineering, University of Science and Technology, Tehran, Iran. Department of Computer Engineering, University of Science and Technology, Tehran, Iran. The knowledge graph plays an important role in the Semantic Web and Natural Language Processing (NLP) tools. There are many knowledge bases in different languages, however lack of Farsi-specific knowledge base appears some defects in research and industrial applications. In this study, the most comprehensive knowledge base in Farsi language is presented, which consists of more than 500K of entities and 7 million relations, which is accessible open source. Data is supplied 3 sources: Farsi Wikipedia and its structured data such as infobox, Web tables, relation extraction module. According to the semantic web, RDF data model and OWL2 ontology employed to implement the Farsi Knowledge Graph (FKG). Resources and their relations are stored in triple format, therefor access to the knowledge graph is provided by a SPARQL endpoint. An ontology, retrieved from DBpedia ontology, was developed and improved Based on resources of Farsi Wikipedia. Also, more than 8000 templates and properties of Wikipedia were mapped to the ontology automatically and manually. Furthermore, a part of the ontology was mapped to the FarsNet, the Persian WordNet, for research purposes. In the graph, there are a large amount of information on a variety of topics including famous people, important places, organizations and companies, literary and art works, physiology, biology, events, species, astronomy, etc. According to the Linked data, most of entities in the FKG have been connected to DBpedia and Wikidata resources by owl:sameAs. In order to achieve high performance and flexible data model, a two-level architecture for storing data was designed to separate data from metadata. This design plays a key role in update operation and managing versions. For evaluation purposes, a small part of triples were randomly collected to build a test dataset for manually inspection. Experimental results demonstrate that more than 94% of triples were obtained correctly through the process of extraction, conversion, mapping, transformation and store. Future of internet according to the semantic web will be a complex and huge global knowledge base, therefor the FKG can play a significant role in defining and developing this emerging technology.http://jipm.irandoc.ac.ir/article-1-4190-en.htmlknowledge baserdfsemantic webfarsi languagelinked data.
collection DOAJ
language fas
format Article
sources DOAJ
author Mohamad Bagher Sajadi
Behrouz Minaei Bidgoli
spellingShingle Mohamad Bagher Sajadi
Behrouz Minaei Bidgoli
The Architecture of Farsi Knowledge Graph System
Iranian Journal of Information Processing & Management
knowledge base
rdf
semantic web
farsi language
linked data.
author_facet Mohamad Bagher Sajadi
Behrouz Minaei Bidgoli
author_sort Mohamad Bagher Sajadi
title The Architecture of Farsi Knowledge Graph System
title_short The Architecture of Farsi Knowledge Graph System
title_full The Architecture of Farsi Knowledge Graph System
title_fullStr The Architecture of Farsi Knowledge Graph System
title_full_unstemmed The Architecture of Farsi Knowledge Graph System
title_sort architecture of farsi knowledge graph system
publisher Iranian Research Institute for Information and Technology
series Iranian Journal of Information Processing & Management
issn 2251-8223
2251-8231
publishDate 2020-03-01
description The knowledge graph plays an important role in the Semantic Web and Natural Language Processing (NLP) tools. There are many knowledge bases in different languages, however lack of Farsi-specific knowledge base appears some defects in research and industrial applications. In this study, the most comprehensive knowledge base in Farsi language is presented, which consists of more than 500K of entities and 7 million relations, which is accessible open source. Data is supplied 3 sources: Farsi Wikipedia and its structured data such as infobox, Web tables, relation extraction module. According to the semantic web, RDF data model and OWL2 ontology employed to implement the Farsi Knowledge Graph (FKG). Resources and their relations are stored in triple format, therefor access to the knowledge graph is provided by a SPARQL endpoint. An ontology, retrieved from DBpedia ontology, was developed and improved Based on resources of Farsi Wikipedia. Also, more than 8000 templates and properties of Wikipedia were mapped to the ontology automatically and manually. Furthermore, a part of the ontology was mapped to the FarsNet, the Persian WordNet, for research purposes. In the graph, there are a large amount of information on a variety of topics including famous people, important places, organizations and companies, literary and art works, physiology, biology, events, species, astronomy, etc. According to the Linked data, most of entities in the FKG have been connected to DBpedia and Wikidata resources by owl:sameAs. In order to achieve high performance and flexible data model, a two-level architecture for storing data was designed to separate data from metadata. This design plays a key role in update operation and managing versions. For evaluation purposes, a small part of triples were randomly collected to build a test dataset for manually inspection. Experimental results demonstrate that more than 94% of triples were obtained correctly through the process of extraction, conversion, mapping, transformation and store. Future of internet according to the semantic web will be a complex and huge global knowledge base, therefor the FKG can play a significant role in defining and developing this emerging technology.
topic knowledge base
rdf
semantic web
farsi language
linked data.
url http://jipm.irandoc.ac.ir/article-1-4190-en.html
work_keys_str_mv AT mohamadbaghersajadi thearchitectureoffarsiknowledgegraphsystem
AT behrouzminaeibidgoli thearchitectureoffarsiknowledgegraphsystem
AT mohamadbaghersajadi architectureoffarsiknowledgegraphsystem
AT behrouzminaeibidgoli architectureoffarsiknowledgegraphsystem
_version_ 1724933741962330112