Exploring Mouse Protein Function via Multiple Approaches.
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2016-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC5112993?pdf=render |
id |
doaj-cbd25a7de1b44585acc9e16ebfdee70c |
---|---|
record_format |
Article |
spelling |
doaj-cbd25a7de1b44585acc9e16ebfdee70c2020-11-24T22:11:27ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-011111e016658010.1371/journal.pone.0166580Exploring Mouse Protein Function via Multiple Approaches.Guohua HuangChen ChuTao HuangXiangyin KongYunhua ZhangNing ZhangYu-Dong CaiAlthough the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.http://europepmc.org/articles/PMC5112993?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Guohua Huang Chen Chu Tao Huang Xiangyin Kong Yunhua Zhang Ning Zhang Yu-Dong Cai |
spellingShingle |
Guohua Huang Chen Chu Tao Huang Xiangyin Kong Yunhua Zhang Ning Zhang Yu-Dong Cai Exploring Mouse Protein Function via Multiple Approaches. PLoS ONE |
author_facet |
Guohua Huang Chen Chu Tao Huang Xiangyin Kong Yunhua Zhang Ning Zhang Yu-Dong Cai |
author_sort |
Guohua Huang |
title |
Exploring Mouse Protein Function via Multiple Approaches. |
title_short |
Exploring Mouse Protein Function via Multiple Approaches. |
title_full |
Exploring Mouse Protein Function via Multiple Approaches. |
title_fullStr |
Exploring Mouse Protein Function via Multiple Approaches. |
title_full_unstemmed |
Exploring Mouse Protein Function via Multiple Approaches. |
title_sort |
exploring mouse protein function via multiple approaches. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2016-01-01 |
description |
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality. |
url |
http://europepmc.org/articles/PMC5112993?pdf=render |
work_keys_str_mv |
AT guohuahuang exploringmouseproteinfunctionviamultipleapproaches AT chenchu exploringmouseproteinfunctionviamultipleapproaches AT taohuang exploringmouseproteinfunctionviamultipleapproaches AT xiangyinkong exploringmouseproteinfunctionviamultipleapproaches AT yunhuazhang exploringmouseproteinfunctionviamultipleapproaches AT ningzhang exploringmouseproteinfunctionviamultipleapproaches AT yudongcai exploringmouseproteinfunctionviamultipleapproaches |
_version_ |
1725805632957710336 |