Exploring Mouse Protein Function via Multiple Approaches.

Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse...

Full description

Bibliographic Details
Main Authors: Guohua Huang, Chen Chu, Tao Huang, Xiangyin Kong, Yunhua Zhang, Ning Zhang, Yu-Dong Cai
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC5112993?pdf=render
id doaj-cbd25a7de1b44585acc9e16ebfdee70c
record_format Article
spelling doaj-cbd25a7de1b44585acc9e16ebfdee70c2020-11-24T22:11:27ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-011111e016658010.1371/journal.pone.0166580Exploring Mouse Protein Function via Multiple Approaches.Guohua HuangChen ChuTao HuangXiangyin KongYunhua ZhangNing ZhangYu-Dong CaiAlthough the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.http://europepmc.org/articles/PMC5112993?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Guohua Huang
Chen Chu
Tao Huang
Xiangyin Kong
Yunhua Zhang
Ning Zhang
Yu-Dong Cai
spellingShingle Guohua Huang
Chen Chu
Tao Huang
Xiangyin Kong
Yunhua Zhang
Ning Zhang
Yu-Dong Cai
Exploring Mouse Protein Function via Multiple Approaches.
PLoS ONE
author_facet Guohua Huang
Chen Chu
Tao Huang
Xiangyin Kong
Yunhua Zhang
Ning Zhang
Yu-Dong Cai
author_sort Guohua Huang
title Exploring Mouse Protein Function via Multiple Approaches.
title_short Exploring Mouse Protein Function via Multiple Approaches.
title_full Exploring Mouse Protein Function via Multiple Approaches.
title_fullStr Exploring Mouse Protein Function via Multiple Approaches.
title_full_unstemmed Exploring Mouse Protein Function via Multiple Approaches.
title_sort exploring mouse protein function via multiple approaches.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2016-01-01
description Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.
url http://europepmc.org/articles/PMC5112993?pdf=render
work_keys_str_mv AT guohuahuang exploringmouseproteinfunctionviamultipleapproaches
AT chenchu exploringmouseproteinfunctionviamultipleapproaches
AT taohuang exploringmouseproteinfunctionviamultipleapproaches
AT xiangyinkong exploringmouseproteinfunctionviamultipleapproaches
AT yunhuazhang exploringmouseproteinfunctionviamultipleapproaches
AT ningzhang exploringmouseproteinfunctionviamultipleapproaches
AT yudongcai exploringmouseproteinfunctionviamultipleapproaches
_version_ 1725805632957710336