Self-Supervised Chinese Ontology Learning from Online Encyclopedias

Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for o...

Full description

Bibliographic Details
Main Authors: Fanghuai Hu, Zhiqing Shao, Tong Ruan
Format: Article
Language:English
Published: Hindawi Limited 2014-01-01
Series:The Scientific World Journal
Online Access:http://dx.doi.org/10.1155/2014/848631
id doaj-e4c8dcabd71a47818c3cd8b1fd3b1673
record_format Article
spelling doaj-e4c8dcabd71a47818c3cd8b1fd3b16732020-11-24T21:52:46ZengHindawi LimitedThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/848631848631Self-Supervised Chinese Ontology Learning from Online EncyclopediasFanghuai Hu0Zhiqing Shao1Tong Ruan2Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, ChinaDepartment of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, ChinaDepartment of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, ChinaConstructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.http://dx.doi.org/10.1155/2014/848631
collection DOAJ
language English
format Article
sources DOAJ
author Fanghuai Hu
Zhiqing Shao
Tong Ruan
spellingShingle Fanghuai Hu
Zhiqing Shao
Tong Ruan
Self-Supervised Chinese Ontology Learning from Online Encyclopedias
The Scientific World Journal
author_facet Fanghuai Hu
Zhiqing Shao
Tong Ruan
author_sort Fanghuai Hu
title Self-Supervised Chinese Ontology Learning from Online Encyclopedias
title_short Self-Supervised Chinese Ontology Learning from Online Encyclopedias
title_full Self-Supervised Chinese Ontology Learning from Online Encyclopedias
title_fullStr Self-Supervised Chinese Ontology Learning from Online Encyclopedias
title_full_unstemmed Self-Supervised Chinese Ontology Learning from Online Encyclopedias
title_sort self-supervised chinese ontology learning from online encyclopedias
publisher Hindawi Limited
series The Scientific World Journal
issn 2356-6140
1537-744X
publishDate 2014-01-01
description Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.
url http://dx.doi.org/10.1155/2014/848631
work_keys_str_mv AT fanghuaihu selfsupervisedchineseontologylearningfromonlineencyclopedias
AT zhiqingshao selfsupervisedchineseontologylearningfromonlineencyclopedias
AT tongruan selfsupervisedchineseontologylearningfromonlineencyclopedias
_version_ 1725875096645533696