Analyzing a co-occurrence gene-interaction network to identify disease-gene association

Abstract Background Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then pe...

Full description

Bibliographic Details
Main Authors: Amira Al-Aamri, Kamal Taha, Yousof Al-Hammadi, Maher Maalouf, Dirar Homouz
Format: Article
Language:English
Published: BMC 2019-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2634-7
id doaj-d38ca3911476455c822fe2a7ac8d7d5a
record_format Article
spelling doaj-d38ca3911476455c822fe2a7ac8d7d5a2020-11-25T00:33:49ZengBMCBMC Bioinformatics1471-21052019-02-0120111510.1186/s12859-019-2634-7Analyzing a co-occurrence gene-interaction network to identify disease-gene associationAmira Al-Aamri0Kamal Taha1Yousof Al-Hammadi2Maher Maalouf3Dirar Homouz4Department of Electrical and Computer EngineeringDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer EngineeringDepartment of Industrial and Systems EngineeringDepartment of PhysicsAbstract Background Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then performs network analysis to identify disease-related genes. We recognize the interacting genes based on their co-occurrence frequency within the biomedical literature and by employing linear and non-linear rare-event classification models. We analyze the constructed network of genes by using different network centrality measures to decide on the importance of each gene. Specifically, we apply betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network and to identify possible cancer-related genes. Results We evaluated the top 15 ranked genes for different cancer types (i.e., Prostate, Breast, and Lung Cancer). The average precisions for identifying breast, prostate, and lung cancer genes vary between 80-100%. On a prostate case study, the system predicted an average of 80% prostate-related genes. Conclusions The results show that our system has the potential for improving the prediction accuracy of identifying gene-gene interaction and disease-gene associations. We also conduct a prostate cancer case study by using the threshold property in logistic regression, and we compare our approach with some of the state-of-the-art methods.http://link.springer.com/article/10.1186/s12859-019-2634-7Text miningDisease-gene associationBiological NLPBiomedical literatureGenetic network
collection DOAJ
language English
format Article
sources DOAJ
author Amira Al-Aamri
Kamal Taha
Yousof Al-Hammadi
Maher Maalouf
Dirar Homouz
spellingShingle Amira Al-Aamri
Kamal Taha
Yousof Al-Hammadi
Maher Maalouf
Dirar Homouz
Analyzing a co-occurrence gene-interaction network to identify disease-gene association
BMC Bioinformatics
Text mining
Disease-gene association
Biological NLP
Biomedical literature
Genetic network
author_facet Amira Al-Aamri
Kamal Taha
Yousof Al-Hammadi
Maher Maalouf
Dirar Homouz
author_sort Amira Al-Aamri
title Analyzing a co-occurrence gene-interaction network to identify disease-gene association
title_short Analyzing a co-occurrence gene-interaction network to identify disease-gene association
title_full Analyzing a co-occurrence gene-interaction network to identify disease-gene association
title_fullStr Analyzing a co-occurrence gene-interaction network to identify disease-gene association
title_full_unstemmed Analyzing a co-occurrence gene-interaction network to identify disease-gene association
title_sort analyzing a co-occurrence gene-interaction network to identify disease-gene association
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-02-01
description Abstract Background Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then performs network analysis to identify disease-related genes. We recognize the interacting genes based on their co-occurrence frequency within the biomedical literature and by employing linear and non-linear rare-event classification models. We analyze the constructed network of genes by using different network centrality measures to decide on the importance of each gene. Specifically, we apply betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network and to identify possible cancer-related genes. Results We evaluated the top 15 ranked genes for different cancer types (i.e., Prostate, Breast, and Lung Cancer). The average precisions for identifying breast, prostate, and lung cancer genes vary between 80-100%. On a prostate case study, the system predicted an average of 80% prostate-related genes. Conclusions The results show that our system has the potential for improving the prediction accuracy of identifying gene-gene interaction and disease-gene associations. We also conduct a prostate cancer case study by using the threshold property in logistic regression, and we compare our approach with some of the state-of-the-art methods.
topic Text mining
Disease-gene association
Biological NLP
Biomedical literature
Genetic network
url http://link.springer.com/article/10.1186/s12859-019-2634-7
work_keys_str_mv AT amiraalaamri analyzingacooccurrencegeneinteractionnetworktoidentifydiseasegeneassociation
AT kamaltaha analyzingacooccurrencegeneinteractionnetworktoidentifydiseasegeneassociation
AT yousofalhammadi analyzingacooccurrencegeneinteractionnetworktoidentifydiseasegeneassociation
AT mahermaalouf analyzingacooccurrencegeneinteractionnetworktoidentifydiseasegeneassociation
AT dirarhomouz analyzingacooccurrencegeneinteractionnetworktoidentifydiseasegeneassociation
_version_ 1725314821904138240