Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection

<p>Abstract</p> <p>Background</p> <p>Nonnegative matrix factorization (NMF) is a feature extraction method that has the property of intuitive part-based representation of the original features. This unique ability makes NMF a potentially promising method for biological...

Full description

Bibliographic Details
Main Authors: Lee Soo-Young, Lee Jaehyung, Jung Inkyung, Kim Dongsup
Format: Article
Language:English
Published: BMC 2008-07-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/298
id doaj-e765e36adb51492e87a14dc535c9acb8
record_format Article
spelling doaj-e765e36adb51492e87a14dc535c9acb82020-11-24T22:01:01ZengBMCBMC Bioinformatics1471-21052008-07-019129810.1186/1471-2105-9-298Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detectionLee Soo-YoungLee JaehyungJung InkyungKim Dongsup<p>Abstract</p> <p>Background</p> <p>Nonnegative matrix factorization (NMF) is a feature extraction method that has the property of intuitive part-based representation of the original features. This unique ability makes NMF a potentially promising method for biological sequence analysis. Here, we apply NMF to fold recognition and remote homolog detection problems. Recent studies have shown that combining support vector machines (SVM) with profile-profile alignments improves performance of fold recognition and remote homolog detection remarkably. However, it is not clear which parts of sequences are essential for the performance improvement.</p> <p>Results</p> <p>The performance of fold recognition and remote homolog detection using NMF features is compared to that of the unmodified profile-profile alignment (PPA) features by estimating Receiver Operating Characteristic (ROC) scores. The overall performance is noticeably improved. For fold recognition at the fold level, SVM with NMF features recognize 30% of homolog proteins at > 0.99 ROC scores, while original PPA feature, HHsearch, and PSI-BLAST recognize almost none. For detecting remote homologs that are related at the superfamily level, NMF features also achieve higher performance than the original PPA features. At > 0.90 ROC<sub>50 </sub>scores, 25% of proteins with NMF features correctly detects remotely related proteins, whereas using original PPA features only 1% of proteins detect remote homologs. In addition, we investigate the effect of number of positive training examples and the number of basis vectors on performance improvement. We also analyze the ability of NMF to extract essential features by comparing NMF basis vectors with functionally important sites and structurally conserved regions of proteins. The results show that NMF basis vectors have significant overlap with functional sites from PROSITE and with structurally conserved regions from the multiple structural alignments generated by MUSTANG. The correlation between NMF basis vectors and biologically essential parts of proteins supports our conjecture that NMF basis vectors can explicitly represent important sites of proteins.</p> <p>Conclusion</p> <p>The present work demonstrates that applying NMF to profile-profile alignments can reveal essential features of proteins and that these features significantly improve the performance of fold recognition and remote homolog detection.</p> http://www.biomedcentral.com/1471-2105/9/298
collection DOAJ
language English
format Article
sources DOAJ
author Lee Soo-Young
Lee Jaehyung
Jung Inkyung
Kim Dongsup
spellingShingle Lee Soo-Young
Lee Jaehyung
Jung Inkyung
Kim Dongsup
Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection
BMC Bioinformatics
author_facet Lee Soo-Young
Lee Jaehyung
Jung Inkyung
Kim Dongsup
author_sort Lee Soo-Young
title Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection
title_short Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection
title_full Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection
title_fullStr Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection
title_full_unstemmed Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection
title_sort application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2008-07-01
description <p>Abstract</p> <p>Background</p> <p>Nonnegative matrix factorization (NMF) is a feature extraction method that has the property of intuitive part-based representation of the original features. This unique ability makes NMF a potentially promising method for biological sequence analysis. Here, we apply NMF to fold recognition and remote homolog detection problems. Recent studies have shown that combining support vector machines (SVM) with profile-profile alignments improves performance of fold recognition and remote homolog detection remarkably. However, it is not clear which parts of sequences are essential for the performance improvement.</p> <p>Results</p> <p>The performance of fold recognition and remote homolog detection using NMF features is compared to that of the unmodified profile-profile alignment (PPA) features by estimating Receiver Operating Characteristic (ROC) scores. The overall performance is noticeably improved. For fold recognition at the fold level, SVM with NMF features recognize 30% of homolog proteins at > 0.99 ROC scores, while original PPA feature, HHsearch, and PSI-BLAST recognize almost none. For detecting remote homologs that are related at the superfamily level, NMF features also achieve higher performance than the original PPA features. At > 0.90 ROC<sub>50 </sub>scores, 25% of proteins with NMF features correctly detects remotely related proteins, whereas using original PPA features only 1% of proteins detect remote homologs. In addition, we investigate the effect of number of positive training examples and the number of basis vectors on performance improvement. We also analyze the ability of NMF to extract essential features by comparing NMF basis vectors with functionally important sites and structurally conserved regions of proteins. The results show that NMF basis vectors have significant overlap with functional sites from PROSITE and with structurally conserved regions from the multiple structural alignments generated by MUSTANG. The correlation between NMF basis vectors and biologically essential parts of proteins supports our conjecture that NMF basis vectors can explicitly represent important sites of proteins.</p> <p>Conclusion</p> <p>The present work demonstrates that applying NMF to profile-profile alignments can reveal essential features of proteins and that these features significantly improve the performance of fold recognition and remote homolog detection.</p>
url http://www.biomedcentral.com/1471-2105/9/298
work_keys_str_mv AT leesooyoung applicationofnonnegativematrixfactorizationtoimproveprofileprofilealignmentfeaturesforfoldrecognitionandremotehomologdetection
AT leejaehyung applicationofnonnegativematrixfactorizationtoimproveprofileprofilealignmentfeaturesforfoldrecognitionandremotehomologdetection
AT junginkyung applicationofnonnegativematrixfactorizationtoimproveprofileprofilealignmentfeaturesforfoldrecognitionandremotehomologdetection
AT kimdongsup applicationofnonnegativematrixfactorizationtoimproveprofileprofilealignmentfeaturesforfoldrecognitionandremotehomologdetection
_version_ 1725842313469493248