Analyzing a co-occurrence gene-interaction network to identify disease-gene association
Abstract Background Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then pe...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-02-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-019-2634-7 |
id |
doaj-d38ca3911476455c822fe2a7ac8d7d5a |
---|---|
record_format |
Article |
spelling |
doaj-d38ca3911476455c822fe2a7ac8d7d5a2020-11-25T00:33:49ZengBMCBMC Bioinformatics1471-21052019-02-0120111510.1186/s12859-019-2634-7Analyzing a co-occurrence gene-interaction network to identify disease-gene associationAmira Al-Aamri0Kamal Taha1Yousof Al-Hammadi2Maher Maalouf3Dirar Homouz4Department of Electrical and Computer EngineeringDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer EngineeringDepartment of Industrial and Systems EngineeringDepartment of PhysicsAbstract Background Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then performs network analysis to identify disease-related genes. We recognize the interacting genes based on their co-occurrence frequency within the biomedical literature and by employing linear and non-linear rare-event classification models. We analyze the constructed network of genes by using different network centrality measures to decide on the importance of each gene. Specifically, we apply betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network and to identify possible cancer-related genes. Results We evaluated the top 15 ranked genes for different cancer types (i.e., Prostate, Breast, and Lung Cancer). The average precisions for identifying breast, prostate, and lung cancer genes vary between 80-100%. On a prostate case study, the system predicted an average of 80% prostate-related genes. Conclusions The results show that our system has the potential for improving the prediction accuracy of identifying gene-gene interaction and disease-gene associations. We also conduct a prostate cancer case study by using the threshold property in logistic regression, and we compare our approach with some of the state-of-the-art methods.http://link.springer.com/article/10.1186/s12859-019-2634-7Text miningDisease-gene associationBiological NLPBiomedical literatureGenetic network |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Amira Al-Aamri Kamal Taha Yousof Al-Hammadi Maher Maalouf Dirar Homouz |
spellingShingle |
Amira Al-Aamri Kamal Taha Yousof Al-Hammadi Maher Maalouf Dirar Homouz Analyzing a co-occurrence gene-interaction network to identify disease-gene association BMC Bioinformatics Text mining Disease-gene association Biological NLP Biomedical literature Genetic network |
author_facet |
Amira Al-Aamri Kamal Taha Yousof Al-Hammadi Maher Maalouf Dirar Homouz |
author_sort |
Amira Al-Aamri |
title |
Analyzing a co-occurrence gene-interaction network to identify disease-gene association |
title_short |
Analyzing a co-occurrence gene-interaction network to identify disease-gene association |
title_full |
Analyzing a co-occurrence gene-interaction network to identify disease-gene association |
title_fullStr |
Analyzing a co-occurrence gene-interaction network to identify disease-gene association |
title_full_unstemmed |
Analyzing a co-occurrence gene-interaction network to identify disease-gene association |
title_sort |
analyzing a co-occurrence gene-interaction network to identify disease-gene association |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2019-02-01 |
description |
Abstract Background Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then performs network analysis to identify disease-related genes. We recognize the interacting genes based on their co-occurrence frequency within the biomedical literature and by employing linear and non-linear rare-event classification models. We analyze the constructed network of genes by using different network centrality measures to decide on the importance of each gene. Specifically, we apply betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network and to identify possible cancer-related genes. Results We evaluated the top 15 ranked genes for different cancer types (i.e., Prostate, Breast, and Lung Cancer). The average precisions for identifying breast, prostate, and lung cancer genes vary between 80-100%. On a prostate case study, the system predicted an average of 80% prostate-related genes. Conclusions The results show that our system has the potential for improving the prediction accuracy of identifying gene-gene interaction and disease-gene associations. We also conduct a prostate cancer case study by using the threshold property in logistic regression, and we compare our approach with some of the state-of-the-art methods. |
topic |
Text mining Disease-gene association Biological NLP Biomedical literature Genetic network |
url |
http://link.springer.com/article/10.1186/s12859-019-2634-7 |
work_keys_str_mv |
AT amiraalaamri analyzingacooccurrencegeneinteractionnetworktoidentifydiseasegeneassociation AT kamaltaha analyzingacooccurrencegeneinteractionnetworktoidentifydiseasegeneassociation AT yousofalhammadi analyzingacooccurrencegeneinteractionnetworktoidentifydiseasegeneassociation AT mahermaalouf analyzingacooccurrencegeneinteractionnetworktoidentifydiseasegeneassociation AT dirarhomouz analyzingacooccurrencegeneinteractionnetworktoidentifydiseasegeneassociation |
_version_ |
1725314821904138240 |