A probabilistic classifier for olfactory receptor pseudogenes

<p>Abstract</p> <p>Background</p> <p>Olfactory receptors (ORs), the largest mammalian gene superfamily (900–1400 genes), has >50% pseudogenes in humans. While most of these inactive genes are identified via coding frame (nonsense) disruptions, seemingly intact genes...

Full description

Bibliographic Details
Main Authors: Lancet Doron, Aloni Ronny, Menashe Idan
Format: Article
Language:English
Published: BMC 2006-08-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/393
id doaj-c3cbf0355a574b56b45872628fba0d65
record_format Article
spelling doaj-c3cbf0355a574b56b45872628fba0d652020-11-24T20:51:44ZengBMCBMC Bioinformatics1471-21052006-08-017139310.1186/1471-2105-7-393A probabilistic classifier for olfactory receptor pseudogenesLancet DoronAloni RonnyMenashe Idan<p>Abstract</p> <p>Background</p> <p>Olfactory receptors (ORs), the largest mammalian gene superfamily (900–1400 genes), has >50% pseudogenes in humans. While most of these inactive genes are identified via coding frame (nonsense) disruptions, seemingly intact genes may also be inactive due to other deleterious (missense) mutations. An ultimate assessment of the actual size of the functional human OR repertoire thus requires an accurate distinction between genes and pseudogenes.</p> <p>Results</p> <p>To characterize inactive ORs with intact open reading frame, we have developed a probabilistic Classifier for Olfactory Receptor Pseudogenes (CORP). This algorithm is based on deviations from a functionally crucial consensus, constituting sixty highly conserved positions identified by a comparison of two evolutionarily-constrained OR repertoires (mouse and dog) with a small pseudogene fraction. We used a logistic regression analysis to assign appropriate coefficients to the conserved position and thus achieving maximal separation between active and inactive ORs. Consequently, the algorithms identified only 5% of the mouse functional ORs as pseudogenes, setting an upper limit of 0.05 to the false positive detection. Finally we used this algorithm to classify the 384 purportedly intact human OR genes. Of these, 135 were predicted as likely encoding non-functional proteins, and 38 were segregating between active and inactive forms due to missense polymorphisms.</p> <p>Conclusion</p> <p>We demonstrated that the CORP algorithm is capable to distinguish between functional and non-functional OR genes with high precision even when the encoded protein would differ by a single amino acid. Using the CORP algorithm, we predict that ~70% of human OR genes are likely non-functional pseudogenes, a much higher number than hitherto suspected. The method we present may be employed for better annotation of inactive members in other gene families as well.</p> <p>CORP algorithm is available at: <url>http://bioportal.weizmann.ac.il/HORDE/CORP/</url></p> http://www.biomedcentral.com/1471-2105/7/393
collection DOAJ
language English
format Article
sources DOAJ
author Lancet Doron
Aloni Ronny
Menashe Idan
spellingShingle Lancet Doron
Aloni Ronny
Menashe Idan
A probabilistic classifier for olfactory receptor pseudogenes
BMC Bioinformatics
author_facet Lancet Doron
Aloni Ronny
Menashe Idan
author_sort Lancet Doron
title A probabilistic classifier for olfactory receptor pseudogenes
title_short A probabilistic classifier for olfactory receptor pseudogenes
title_full A probabilistic classifier for olfactory receptor pseudogenes
title_fullStr A probabilistic classifier for olfactory receptor pseudogenes
title_full_unstemmed A probabilistic classifier for olfactory receptor pseudogenes
title_sort probabilistic classifier for olfactory receptor pseudogenes
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2006-08-01
description <p>Abstract</p> <p>Background</p> <p>Olfactory receptors (ORs), the largest mammalian gene superfamily (900–1400 genes), has >50% pseudogenes in humans. While most of these inactive genes are identified via coding frame (nonsense) disruptions, seemingly intact genes may also be inactive due to other deleterious (missense) mutations. An ultimate assessment of the actual size of the functional human OR repertoire thus requires an accurate distinction between genes and pseudogenes.</p> <p>Results</p> <p>To characterize inactive ORs with intact open reading frame, we have developed a probabilistic Classifier for Olfactory Receptor Pseudogenes (CORP). This algorithm is based on deviations from a functionally crucial consensus, constituting sixty highly conserved positions identified by a comparison of two evolutionarily-constrained OR repertoires (mouse and dog) with a small pseudogene fraction. We used a logistic regression analysis to assign appropriate coefficients to the conserved position and thus achieving maximal separation between active and inactive ORs. Consequently, the algorithms identified only 5% of the mouse functional ORs as pseudogenes, setting an upper limit of 0.05 to the false positive detection. Finally we used this algorithm to classify the 384 purportedly intact human OR genes. Of these, 135 were predicted as likely encoding non-functional proteins, and 38 were segregating between active and inactive forms due to missense polymorphisms.</p> <p>Conclusion</p> <p>We demonstrated that the CORP algorithm is capable to distinguish between functional and non-functional OR genes with high precision even when the encoded protein would differ by a single amino acid. Using the CORP algorithm, we predict that ~70% of human OR genes are likely non-functional pseudogenes, a much higher number than hitherto suspected. The method we present may be employed for better annotation of inactive members in other gene families as well.</p> <p>CORP algorithm is available at: <url>http://bioportal.weizmann.ac.il/HORDE/CORP/</url></p>
url http://www.biomedcentral.com/1471-2105/7/393
work_keys_str_mv AT lancetdoron aprobabilisticclassifierforolfactoryreceptorpseudogenes
AT alonironny aprobabilisticclassifierforolfactoryreceptorpseudogenes
AT menasheidan aprobabilisticclassifierforolfactoryreceptorpseudogenes
AT lancetdoron probabilisticclassifierforolfactoryreceptorpseudogenes
AT alonironny probabilisticclassifierforolfactoryreceptorpseudogenes
AT menasheidan probabilisticclassifierforolfactoryreceptorpseudogenes
_version_ 1716801457396645888