An unsupervised classification scheme for improving predictions of prokaryotic TIS

<p>Abstract</p> <p>Background</p> <p>Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a num...

Full description

Bibliographic Details
Main Authors: Meinicke Peter, Tech Maike
Format: Article
Language:English
Published: BMC 2006-03-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/121
id doaj-6fa8907924b24ecaa1b2d91d86a69b53
record_format Article
spelling doaj-6fa8907924b24ecaa1b2d91d86a69b532020-11-25T01:32:31ZengBMCBMC Bioinformatics1471-21052006-03-017112110.1186/1471-2105-7-121An unsupervised classification scheme for improving predictions of prokaryotic TISMeinicke PeterTech Maike<p>Abstract</p> <p>Background</p> <p>Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation of prokaryotic TIS. However, inherent difficulties of these approaches arise from the considerable variation of TIS characteristics across different species. Therefore prior assumptions about the properties of prokaryotic gene starts may cause suboptimal predictions for newly sequenced genomes with TIS signals differing from those of well-investigated genomes.</p> <p>Results</p> <p>We introduce a clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation. As compared with other methods for improving predictions of gene starts in bacterial genomes, our approach is not based on any specific assumptions about prokaryotic TIS. Despite the generality of the underlying algorithm, the prediction rate of our method is competitive on experimentally verified test data from <it>E. coli </it>and <it>B. subtilis</it>. Regarding genomes with high G+C content, in contrast to some previously proposed methods, our algorithm also provides good performance on <it>P. aeruginosa</it>, <it>B. pseudomallei </it>and <it>R. solanacearum</it>.</p> <p>Conclusion</p> <p>On reliable test data we showed that our method provides good results in post-processing the predictions of the widely-used program GLIMMER. The underlying clustering algorithm is robust with respect to variations in the initial TIS annotation and does not require specific assumptions about prokaryotic gene starts. These features are particularly useful on genomes with high G+C content. The algorithm has been implemented in the tool »TICO«(TIs COrrector) which is publicly available from our web site.</p> http://www.biomedcentral.com/1471-2105/7/121
collection DOAJ
language English
format Article
sources DOAJ
author Meinicke Peter
Tech Maike
spellingShingle Meinicke Peter
Tech Maike
An unsupervised classification scheme for improving predictions of prokaryotic TIS
BMC Bioinformatics
author_facet Meinicke Peter
Tech Maike
author_sort Meinicke Peter
title An unsupervised classification scheme for improving predictions of prokaryotic TIS
title_short An unsupervised classification scheme for improving predictions of prokaryotic TIS
title_full An unsupervised classification scheme for improving predictions of prokaryotic TIS
title_fullStr An unsupervised classification scheme for improving predictions of prokaryotic TIS
title_full_unstemmed An unsupervised classification scheme for improving predictions of prokaryotic TIS
title_sort unsupervised classification scheme for improving predictions of prokaryotic tis
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2006-03-01
description <p>Abstract</p> <p>Background</p> <p>Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation of prokaryotic TIS. However, inherent difficulties of these approaches arise from the considerable variation of TIS characteristics across different species. Therefore prior assumptions about the properties of prokaryotic gene starts may cause suboptimal predictions for newly sequenced genomes with TIS signals differing from those of well-investigated genomes.</p> <p>Results</p> <p>We introduce a clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation. As compared with other methods for improving predictions of gene starts in bacterial genomes, our approach is not based on any specific assumptions about prokaryotic TIS. Despite the generality of the underlying algorithm, the prediction rate of our method is competitive on experimentally verified test data from <it>E. coli </it>and <it>B. subtilis</it>. Regarding genomes with high G+C content, in contrast to some previously proposed methods, our algorithm also provides good performance on <it>P. aeruginosa</it>, <it>B. pseudomallei </it>and <it>R. solanacearum</it>.</p> <p>Conclusion</p> <p>On reliable test data we showed that our method provides good results in post-processing the predictions of the widely-used program GLIMMER. The underlying clustering algorithm is robust with respect to variations in the initial TIS annotation and does not require specific assumptions about prokaryotic gene starts. These features are particularly useful on genomes with high G+C content. The algorithm has been implemented in the tool »TICO«(TIs COrrector) which is publicly available from our web site.</p>
url http://www.biomedcentral.com/1471-2105/7/121
work_keys_str_mv AT meinickepeter anunsupervisedclassificationschemeforimprovingpredictionsofprokaryotictis
AT techmaike anunsupervisedclassificationschemeforimprovingpredictionsofprokaryotictis
AT meinickepeter unsupervisedclassificationschemeforimprovingpredictionsofprokaryotictis
AT techmaike unsupervisedclassificationschemeforimprovingpredictionsofprokaryotictis
_version_ 1725081602964324352