Investigating the Impact of Gene Cofunctionality in Predicting Gene Mutations of <italic>E. coli</italic>

Machine learning algorithms (MLAs) have recently been applied to predict gene mutations of Escherichia coli (E. coli) under different exposure conditions, with room for improvement in performance. In a bid to improve performance, we hypothesize that incorporating the interactions between genes will...

Full description

Bibliographic Details
Main Authors: Michael Okwori, Ali Eslami
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9195469/
id doaj-8a1a397044d346479cebfce983a3d36d
record_format Article
spelling doaj-8a1a397044d346479cebfce983a3d36d2021-03-30T03:46:49ZengIEEEIEEE Access2169-35362020-01-01816739716741010.1109/ACCESS.2020.30236629195469Investigating the Impact of Gene Cofunctionality in Predicting Gene Mutations of <italic>E. coli</italic>Michael Okwori0https://orcid.org/0000-0002-8827-8685Ali Eslami1https://orcid.org/0000-0002-7907-1930Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS, USADepartment of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS, USAMachine learning algorithms (MLAs) have recently been applied to predict gene mutations of Escherichia coli (E. coli) under different exposure conditions, with room for improvement in performance. In a bid to improve performance, we hypothesize that incorporating the interactions between genes will help MLAs make better predictions. To investigate this, we integrated protein-coding gene cofunctional networks into a mutation dataset of E. coli exposed to different conditions. Also, we proposed a feature-selection algorithm based on gene cofunctional networks to pick the most relevant exposure conditions. Then, we used the extended dataset to train a support vector classifier, an artificial neural network, and an ensemble of both MLAs. Separate models were trained for each of the protein-coding genes. Validation results showed that our approach improved both the area under the receiver operating characteristic (ROC) curve (AUC) and the area under the precision-recall curve (AUPRC). A peak increase of 8.20% in AUPRC was observed. A similar analysis on selected genes, with ten or more mutation points for each gene, also showed improvement in the general performance of the MLAs. Out-of-sample testing on adaptive laboratory evolution experiments curated from the literature provided further evidence of an enhanced mutation-prediction performance, where a maximum 8.74% boost in the AUC was observed. Finally, we highlighted the genes with the most improved and most degraded predictions due to the additional information of the cofunctional genes. This work suggests that the functional relationship between genes may play a role in gene mutation and illustrates how the relationships might help to improve mutation prediction.https://ieeexplore.ieee.org/document/9195469/Mutation prediction<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">E. coli</italic> gene interactionsfeature selectionmachine learningartificial neural networksupport vector classifier
collection DOAJ
language English
format Article
sources DOAJ
author Michael Okwori
Ali Eslami
spellingShingle Michael Okwori
Ali Eslami
Investigating the Impact of Gene Cofunctionality in Predicting Gene Mutations of <italic>E. coli</italic>
IEEE Access
Mutation prediction
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">E. coli</italic> gene interactions
feature selection
machine learning
artificial neural network
support vector classifier
author_facet Michael Okwori
Ali Eslami
author_sort Michael Okwori
title Investigating the Impact of Gene Cofunctionality in Predicting Gene Mutations of <italic>E. coli</italic>
title_short Investigating the Impact of Gene Cofunctionality in Predicting Gene Mutations of <italic>E. coli</italic>
title_full Investigating the Impact of Gene Cofunctionality in Predicting Gene Mutations of <italic>E. coli</italic>
title_fullStr Investigating the Impact of Gene Cofunctionality in Predicting Gene Mutations of <italic>E. coli</italic>
title_full_unstemmed Investigating the Impact of Gene Cofunctionality in Predicting Gene Mutations of <italic>E. coli</italic>
title_sort investigating the impact of gene cofunctionality in predicting gene mutations of <italic>e. coli</italic>
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Machine learning algorithms (MLAs) have recently been applied to predict gene mutations of Escherichia coli (E. coli) under different exposure conditions, with room for improvement in performance. In a bid to improve performance, we hypothesize that incorporating the interactions between genes will help MLAs make better predictions. To investigate this, we integrated protein-coding gene cofunctional networks into a mutation dataset of E. coli exposed to different conditions. Also, we proposed a feature-selection algorithm based on gene cofunctional networks to pick the most relevant exposure conditions. Then, we used the extended dataset to train a support vector classifier, an artificial neural network, and an ensemble of both MLAs. Separate models were trained for each of the protein-coding genes. Validation results showed that our approach improved both the area under the receiver operating characteristic (ROC) curve (AUC) and the area under the precision-recall curve (AUPRC). A peak increase of 8.20% in AUPRC was observed. A similar analysis on selected genes, with ten or more mutation points for each gene, also showed improvement in the general performance of the MLAs. Out-of-sample testing on adaptive laboratory evolution experiments curated from the literature provided further evidence of an enhanced mutation-prediction performance, where a maximum 8.74% boost in the AUC was observed. Finally, we highlighted the genes with the most improved and most degraded predictions due to the additional information of the cofunctional genes. This work suggests that the functional relationship between genes may play a role in gene mutation and illustrates how the relationships might help to improve mutation prediction.
topic Mutation prediction
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">E. coli</italic> gene interactions
feature selection
machine learning
artificial neural network
support vector classifier
url https://ieeexplore.ieee.org/document/9195469/
work_keys_str_mv AT michaelokwori investigatingtheimpactofgenecofunctionalityinpredictinggenemutationsofitalicecoliitalic
AT alieslami investigatingtheimpactofgenecofunctionalityinpredictinggenemutationsofitalicecoliitalic
_version_ 1724182859169660928