Formalizing biomedical concepts from textual definitions
Background Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedic...
Main Authors: | , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden
2016
|
Subjects: | |
Online Access: | http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-192186 http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-192186 http://www.qucosa.de/fileadmin/data/qucosa/documents/19218/13326_2015_Article_15.pdf |
id |
ndltd-DRESDEN-oai-qucosa.de-bsz-14-qucosa-192186 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-DRESDEN-oai-qucosa.de-bsz-14-qucosa-1921862016-01-05T03:30:10Z Formalizing biomedical concepts from textual definitions Tsatsaronis, George Ma, Yue Petrova, Alina Kissa, Maria Distel, Felix Baader , Franz Schroeder, Michael formale Definition biomedizinische Ontologien TU Dresden Publikationsfonds Formal definitions Biomedical ontologies Relation extraction SNOMED CT MeSH Technical University Dresden Publication funds ddc:570 rvk:WH 3100 Background Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. Results We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. Conclusions The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL. Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden Journal of Biomedical Semantics, 2016-01-04 doc-type:article application/pdf http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-192186 urn:nbn:de:bsz:14-qucosa-192186 issn:2041-1480 http://www.qucosa.de/fileadmin/data/qucosa/documents/19218/13326_2015_Article_15.pdf Journal of Biomedical Semantics 2015, 6:22, ISSN 2041-1480 eng |
collection |
NDLTD |
language |
English |
format |
Article |
sources |
NDLTD |
topic |
formale Definition biomedizinische Ontologien TU Dresden Publikationsfonds Formal definitions Biomedical ontologies Relation extraction SNOMED CT MeSH Technical University Dresden Publication funds ddc:570 rvk:WH 3100 |
spellingShingle |
formale Definition biomedizinische Ontologien TU Dresden Publikationsfonds Formal definitions Biomedical ontologies Relation extraction SNOMED CT MeSH Technical University Dresden Publication funds ddc:570 rvk:WH 3100 Tsatsaronis, George Ma, Yue Petrova, Alina Kissa, Maria Distel, Felix Baader , Franz Schroeder, Michael Formalizing biomedical concepts from textual definitions |
description |
Background
Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions.
Results
We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions.
Conclusions
The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL. |
author2 |
Journal of Biomedical Semantics, |
author_facet |
Journal of Biomedical Semantics, Tsatsaronis, George Ma, Yue Petrova, Alina Kissa, Maria Distel, Felix Baader , Franz Schroeder, Michael |
author |
Tsatsaronis, George Ma, Yue Petrova, Alina Kissa, Maria Distel, Felix Baader , Franz Schroeder, Michael |
author_sort |
Tsatsaronis, George |
title |
Formalizing biomedical concepts from textual definitions |
title_short |
Formalizing biomedical concepts from textual definitions |
title_full |
Formalizing biomedical concepts from textual definitions |
title_fullStr |
Formalizing biomedical concepts from textual definitions |
title_full_unstemmed |
Formalizing biomedical concepts from textual definitions |
title_sort |
formalizing biomedical concepts from textual definitions |
publisher |
Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden |
publishDate |
2016 |
url |
http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-192186 http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-192186 http://www.qucosa.de/fileadmin/data/qucosa/documents/19218/13326_2015_Article_15.pdf |
work_keys_str_mv |
AT tsatsaronisgeorge formalizingbiomedicalconceptsfromtextualdefinitions AT mayue formalizingbiomedicalconceptsfromtextualdefinitions AT petrovaalina formalizingbiomedicalconceptsfromtextualdefinitions AT kissamaria formalizingbiomedicalconceptsfromtextualdefinitions AT distelfelix formalizingbiomedicalconceptsfromtextualdefinitions AT baaderfranz formalizingbiomedicalconceptsfromtextualdefinitions AT schroedermichael formalizingbiomedicalconceptsfromtextualdefinitions |
_version_ |
1718160364059754496 |