Self-Supervised Chinese Ontology Learning from Online Encyclopedias
Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for o...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2014-01-01
|
Series: | The Scientific World Journal |
Online Access: | http://dx.doi.org/10.1155/2014/848631 |
Summary: | Constructing ontology manually is a time-consuming, error-prone,
and tedious task. We present SSCO, a self-supervised learning
based chinese ontology, which contains about 255 thousand concepts,
5 million entities, and 40 million facts. We explore the three largest online
Chinese encyclopedias for ontology learning and describe how to
transfer the structured knowledge in encyclopedias, including article titles,
category labels, redirection pages, taxonomy systems, and InfoBox
modules, into ontological form. In order to avoid the errors in encyclopedias
and enrich the learnt ontology, we also apply some machine
learning based methods. First, we proof that the self-supervised machine
learning method is practicable in Chinese relation extraction (at least
for synonymy and hyponymy) statistically and experimentally and train
some self-supervised models (SVMs and CRFs) for synonymy extraction,
concept-subconcept relation extraction, and concept-instance relation extraction;
the advantages of our methods are that all training examples
are automatically generated from the structural information of encyclopedias
and a few general heuristic rules. Finally, we evaluate SSCO in
two aspects, scale and precision; manual evaluation results show that
the ontology has excellent precision, and high coverage is concluded by
comparing SSCO with other famous ontologies and knowledge bases; the
experiment results also indicate that the self-supervised models obviously
enrich SSCO. |
---|---|
ISSN: | 2356-6140 1537-744X |