Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature

Abstract Background Recently, research on human disease network has succeeded and has become an aid in figuring out the relationship between various diseases. In most disease networks, however, the relationship between diseases has been simply represented as an association. This representation resul...

Full description

Bibliographic Details
Main Authors: Dong-gi Lee, Hyunjung Shin
Format: Article
Language:English
Published: BMC 2017-05-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-017-0448-y
id doaj-5665c78080a3488bbb6f9c812157525b
record_format Article
spelling doaj-5665c78080a3488bbb6f9c812157525b2020-11-24T22:50:04ZengBMCBMC Medical Informatics and Decision Making1472-69472017-05-0117S11910.1186/s12911-017-0448-yDisease causality extraction based on lexical semantics and document-clause frequency from biomedical literatureDong-gi Lee0Hyunjung Shin1Department of Industrial Engineering, Ajou UniversityDepartment of Industrial Engineering, Ajou UniversityAbstract Background Recently, research on human disease network has succeeded and has become an aid in figuring out the relationship between various diseases. In most disease networks, however, the relationship between diseases has been simply represented as an association. This representation results in the difficulty of identifying prior diseases and their influence on posterior diseases. In this paper, we propose a causal disease network that implements disease causality through text mining on biomedical literature. Methods To identify the causality between diseases, the proposed method includes two schemes: the first is the lexicon-based causality term strength, which provides the causal strength on a variety of causality terms based on lexicon analysis. The second is the frequency-based causality strength, which determines the direction and strength of causality based on document and clause frequencies in the literature. Results We applied the proposed method to 6,617,833 PubMed literature, and chose 195 diseases to construct a causal disease network. From all possible pairs of disease nodes in the network, 1011 causal pairs of 149 diseases were extracted. The resulting network was compared with that of a previous study. In terms of both coverage and quality, the proposed method showed outperforming results; it determined 2.7 times more causalities and showed higher correlation with associated diseases than the existing method. Conclusions This research has novelty in which the proposed method circumvents the limitations of time and cost in applying all possible causalities in biological experiments and it is a more advanced text mining technique by defining the concepts of causality term strength.http://link.springer.com/article/10.1186/s12911-017-0448-yDisease causalityText miningLexical semanticsDocument-clause frequency
collection DOAJ
language English
format Article
sources DOAJ
author Dong-gi Lee
Hyunjung Shin
spellingShingle Dong-gi Lee
Hyunjung Shin
Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature
BMC Medical Informatics and Decision Making
Disease causality
Text mining
Lexical semantics
Document-clause frequency
author_facet Dong-gi Lee
Hyunjung Shin
author_sort Dong-gi Lee
title Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature
title_short Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature
title_full Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature
title_fullStr Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature
title_full_unstemmed Disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature
title_sort disease causality extraction based on lexical semantics and document-clause frequency from biomedical literature
publisher BMC
series BMC Medical Informatics and Decision Making
issn 1472-6947
publishDate 2017-05-01
description Abstract Background Recently, research on human disease network has succeeded and has become an aid in figuring out the relationship between various diseases. In most disease networks, however, the relationship between diseases has been simply represented as an association. This representation results in the difficulty of identifying prior diseases and their influence on posterior diseases. In this paper, we propose a causal disease network that implements disease causality through text mining on biomedical literature. Methods To identify the causality between diseases, the proposed method includes two schemes: the first is the lexicon-based causality term strength, which provides the causal strength on a variety of causality terms based on lexicon analysis. The second is the frequency-based causality strength, which determines the direction and strength of causality based on document and clause frequencies in the literature. Results We applied the proposed method to 6,617,833 PubMed literature, and chose 195 diseases to construct a causal disease network. From all possible pairs of disease nodes in the network, 1011 causal pairs of 149 diseases were extracted. The resulting network was compared with that of a previous study. In terms of both coverage and quality, the proposed method showed outperforming results; it determined 2.7 times more causalities and showed higher correlation with associated diseases than the existing method. Conclusions This research has novelty in which the proposed method circumvents the limitations of time and cost in applying all possible causalities in biological experiments and it is a more advanced text mining technique by defining the concepts of causality term strength.
topic Disease causality
Text mining
Lexical semantics
Document-clause frequency
url http://link.springer.com/article/10.1186/s12911-017-0448-y
work_keys_str_mv AT donggilee diseasecausalityextractionbasedonlexicalsemanticsanddocumentclausefrequencyfrombiomedicalliterature
AT hyunjungshin diseasecausalityextractionbasedonlexicalsemanticsanddocumentclausefrequencyfrombiomedicalliterature
_version_ 1725673498025656320