Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature
Abstract Background Recently, research on human disease network has succeeded and has become an aid in figuring out the relationship between various diseases. In most disease networks, however, the relationship between diseases has been simply represented as an association. This representation resul...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2017-05-01
|
Series: | BMC Medical Informatics and Decision Making |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12911-017-0448-y |
id |
doaj-5665c78080a3488bbb6f9c812157525b |
---|---|
record_format |
Article |
spelling |
doaj-5665c78080a3488bbb6f9c812157525b2020-11-24T22:50:04ZengBMCBMC Medical Informatics and Decision Making1472-69472017-05-0117S11910.1186/s12911-017-0448-yDisease causality extraction based on lexical semantics and document-clause frequency from biomedical literatureDong-gi Lee0Hyunjung Shin1Department of Industrial Engineering, Ajou UniversityDepartment of Industrial Engineering, Ajou UniversityAbstract Background Recently, research on human disease network has succeeded and has become an aid in figuring out the relationship between various diseases. In most disease networks, however, the relationship between diseases has been simply represented as an association. This representation results in the difficulty of identifying prior diseases and their influence on posterior diseases. In this paper, we propose a causal disease network that implements disease causality through text mining on biomedical literature. Methods To identify the causality between diseases, the proposed method includes two schemes: the first is the lexicon-based causality term strength, which provides the causal strength on a variety of causality terms based on lexicon analysis. The second is the frequency-based causality strength, which determines the direction and strength of causality based on document and clause frequencies in the literature. Results We applied the proposed method to 6,617,833 PubMed literature, and chose 195 diseases to construct a causal disease network. From all possible pairs of disease nodes in the network, 1011 causal pairs of 149 diseases were extracted. The resulting network was compared with that of a previous study. In terms of both coverage and quality, the proposed method showed outperforming results; it determined 2.7 times more causalities and showed higher correlation with associated diseases than the existing method. Conclusions This research has novelty in which the proposed method circumvents the limitations of time and cost in applying all possible causalities in biological experiments and it is a more advanced text mining technique by defining the concepts of causality term strength.http://link.springer.com/article/10.1186/s12911-017-0448-yDisease causalityText miningLexical semanticsDocument-clause frequency |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Dong-gi Lee Hyunjung Shin |
spellingShingle |
Dong-gi Lee Hyunjung Shin Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature BMC Medical Informatics and Decision Making Disease causality Text mining Lexical semantics Document-clause frequency |
author_facet |
Dong-gi Lee Hyunjung Shin |
author_sort |
Dong-gi Lee |
title |
Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature |
title_short |
Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature |
title_full |
Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature |
title_fullStr |
Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature |
title_full_unstemmed |
Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature |
title_sort |
disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature |
publisher |
BMC |
series |
BMC Medical Informatics and Decision Making |
issn |
1472-6947 |
publishDate |
2017-05-01 |
description |
Abstract Background Recently, research on human disease network has succeeded and has become an aid in figuring out the relationship between various diseases. In most disease networks, however, the relationship between diseases has been simply represented as an association. This representation results in the difficulty of identifying prior diseases and their influence on posterior diseases. In this paper, we propose a causal disease network that implements disease causality through text mining on biomedical literature. Methods To identify the causality between diseases, the proposed method includes two schemes: the first is the lexicon-based causality term strength, which provides the causal strength on a variety of causality terms based on lexicon analysis. The second is the frequency-based causality strength, which determines the direction and strength of causality based on document and clause frequencies in the literature. Results We applied the proposed method to 6,617,833 PubMed literature, and chose 195 diseases to construct a causal disease network. From all possible pairs of disease nodes in the network, 1011 causal pairs of 149 diseases were extracted. The resulting network was compared with that of a previous study. In terms of both coverage and quality, the proposed method showed outperforming results; it determined 2.7 times more causalities and showed higher correlation with associated diseases than the existing method. Conclusions This research has novelty in which the proposed method circumvents the limitations of time and cost in applying all possible causalities in biological experiments and it is a more advanced text mining technique by defining the concepts of causality term strength. |
topic |
Disease causality Text mining Lexical semantics Document-clause frequency |
url |
http://link.springer.com/article/10.1186/s12911-017-0448-y |
work_keys_str_mv |
AT donggilee diseasecausalityextractionbasedonlexicalsemanticsanddocumentclausefrequencyfrombiomedicalliterature AT hyunjungshin diseasecausalityextractionbasedonlexicalsemanticsanddocumentclausefrequencyfrombiomedicalliterature |
_version_ |
1725673498025656320 |