Formalizing biomedical concepts from textual definitions

BACKGROUND: Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedi...

Full description

Bibliographic Details
Main Authors: Petrova, Alina, Ma, Yue, Tsatsaronis, George, Kissa, Maria, Distel, Felix, Baader, Franz, Schroeder, Michael
Other Authors: BioMed Central,
Format: Article
Language:English
Published: Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden 2016
Subjects:
Online Access:http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-191181
http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-191181
http://www.qucosa.de/fileadmin/data/qucosa/documents/19118/s13326-015-0015-3.pdf
id ndltd-DRESDEN-oai-qucosa.de-bsz-14-qucosa-191181
record_format oai_dc
spelling ndltd-DRESDEN-oai-qucosa.de-bsz-14-qucosa-1911812016-03-04T03:33:17Z Formalizing biomedical concepts from textual definitions Petrova, Alina Ma, Yue Tsatsaronis, George Kissa, Maria Distel, Felix Baader, Franz Schroeder, Michael Biomedizinische Ontologien formale Definitionen TU Dresden Publikationsfonds Biomedical ontologies Formal definitions MeSH Relation extraction SNOMED CT Technical University Dresden Publication funds ddc:610 rvk:XA 10000 BACKGROUND: Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. RESULTS: We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations' domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations' domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. CONCLUSIONS: The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL. Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden BioMed Central, 2016-01-07 doc-type:article application/pdf http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-191181 urn:nbn:de:bsz:14-qucosa-191181 issn:2041-1480 PPN456894225 http://www.qucosa.de/fileadmin/data/qucosa/documents/19118/s13326-015-0015-3.pdf Journal of biomedical semantics, Vol. 6 (2015) Bd. 22, ISSN 2041-1480 eng
collection NDLTD
language English
format Article
sources NDLTD
topic Biomedizinische Ontologien
formale Definitionen
TU Dresden
Publikationsfonds
Biomedical ontologies
Formal definitions
MeSH
Relation extraction
SNOMED CT
Technical University Dresden
Publication funds
ddc:610
rvk:XA 10000
spellingShingle Biomedizinische Ontologien
formale Definitionen
TU Dresden
Publikationsfonds
Biomedical ontologies
Formal definitions
MeSH
Relation extraction
SNOMED CT
Technical University Dresden
Publication funds
ddc:610
rvk:XA 10000
Petrova, Alina
Ma, Yue
Tsatsaronis, George
Kissa, Maria
Distel, Felix
Baader, Franz
Schroeder, Michael
Formalizing biomedical concepts from textual definitions
description BACKGROUND: Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. RESULTS: We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations' domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations' domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. CONCLUSIONS: The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL.
author2 BioMed Central,
author_facet BioMed Central,
Petrova, Alina
Ma, Yue
Tsatsaronis, George
Kissa, Maria
Distel, Felix
Baader, Franz
Schroeder, Michael
author Petrova, Alina
Ma, Yue
Tsatsaronis, George
Kissa, Maria
Distel, Felix
Baader, Franz
Schroeder, Michael
author_sort Petrova, Alina
title Formalizing biomedical concepts from textual definitions
title_short Formalizing biomedical concepts from textual definitions
title_full Formalizing biomedical concepts from textual definitions
title_fullStr Formalizing biomedical concepts from textual definitions
title_full_unstemmed Formalizing biomedical concepts from textual definitions
title_sort formalizing biomedical concepts from textual definitions
publisher Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden
publishDate 2016
url http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-191181
http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-191181
http://www.qucosa.de/fileadmin/data/qucosa/documents/19118/s13326-015-0015-3.pdf
work_keys_str_mv AT petrovaalina formalizingbiomedicalconceptsfromtextualdefinitions
AT mayue formalizingbiomedicalconceptsfromtextualdefinitions
AT tsatsaronisgeorge formalizingbiomedicalconceptsfromtextualdefinitions
AT kissamaria formalizingbiomedicalconceptsfromtextualdefinitions
AT distelfelix formalizingbiomedicalconceptsfromtextualdefinitions
AT baaderfranz formalizingbiomedicalconceptsfromtextualdefinitions
AT schroedermichael formalizingbiomedicalconceptsfromtextualdefinitions
_version_ 1718198031217590272