A probabilistic classifier for olfactory receptor pseudogenes
<p>Abstract</p> <p>Background</p> <p>Olfactory receptors (ORs), the largest mammalian gene superfamily (900–1400 genes), has >50% pseudogenes in humans. While most of these inactive genes are identified via coding frame (nonsense) disruptions, seemingly intact genes...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2006-08-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/7/393 |
id |
doaj-c3cbf0355a574b56b45872628fba0d65 |
---|---|
record_format |
Article |
spelling |
doaj-c3cbf0355a574b56b45872628fba0d652020-11-24T20:51:44ZengBMCBMC Bioinformatics1471-21052006-08-017139310.1186/1471-2105-7-393A probabilistic classifier for olfactory receptor pseudogenesLancet DoronAloni RonnyMenashe Idan<p>Abstract</p> <p>Background</p> <p>Olfactory receptors (ORs), the largest mammalian gene superfamily (900–1400 genes), has >50% pseudogenes in humans. While most of these inactive genes are identified via coding frame (nonsense) disruptions, seemingly intact genes may also be inactive due to other deleterious (missense) mutations. An ultimate assessment of the actual size of the functional human OR repertoire thus requires an accurate distinction between genes and pseudogenes.</p> <p>Results</p> <p>To characterize inactive ORs with intact open reading frame, we have developed a probabilistic Classifier for Olfactory Receptor Pseudogenes (CORP). This algorithm is based on deviations from a functionally crucial consensus, constituting sixty highly conserved positions identified by a comparison of two evolutionarily-constrained OR repertoires (mouse and dog) with a small pseudogene fraction. We used a logistic regression analysis to assign appropriate coefficients to the conserved position and thus achieving maximal separation between active and inactive ORs. Consequently, the algorithms identified only 5% of the mouse functional ORs as pseudogenes, setting an upper limit of 0.05 to the false positive detection. Finally we used this algorithm to classify the 384 purportedly intact human OR genes. Of these, 135 were predicted as likely encoding non-functional proteins, and 38 were segregating between active and inactive forms due to missense polymorphisms.</p> <p>Conclusion</p> <p>We demonstrated that the CORP algorithm is capable to distinguish between functional and non-functional OR genes with high precision even when the encoded protein would differ by a single amino acid. Using the CORP algorithm, we predict that ~70% of human OR genes are likely non-functional pseudogenes, a much higher number than hitherto suspected. The method we present may be employed for better annotation of inactive members in other gene families as well.</p> <p>CORP algorithm is available at: <url>http://bioportal.weizmann.ac.il/HORDE/CORP/</url></p> http://www.biomedcentral.com/1471-2105/7/393 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Lancet Doron Aloni Ronny Menashe Idan |
spellingShingle |
Lancet Doron Aloni Ronny Menashe Idan A probabilistic classifier for olfactory receptor pseudogenes BMC Bioinformatics |
author_facet |
Lancet Doron Aloni Ronny Menashe Idan |
author_sort |
Lancet Doron |
title |
A probabilistic classifier for olfactory receptor pseudogenes |
title_short |
A probabilistic classifier for olfactory receptor pseudogenes |
title_full |
A probabilistic classifier for olfactory receptor pseudogenes |
title_fullStr |
A probabilistic classifier for olfactory receptor pseudogenes |
title_full_unstemmed |
A probabilistic classifier for olfactory receptor pseudogenes |
title_sort |
probabilistic classifier for olfactory receptor pseudogenes |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2006-08-01 |
description |
<p>Abstract</p> <p>Background</p> <p>Olfactory receptors (ORs), the largest mammalian gene superfamily (900–1400 genes), has >50% pseudogenes in humans. While most of these inactive genes are identified via coding frame (nonsense) disruptions, seemingly intact genes may also be inactive due to other deleterious (missense) mutations. An ultimate assessment of the actual size of the functional human OR repertoire thus requires an accurate distinction between genes and pseudogenes.</p> <p>Results</p> <p>To characterize inactive ORs with intact open reading frame, we have developed a probabilistic Classifier for Olfactory Receptor Pseudogenes (CORP). This algorithm is based on deviations from a functionally crucial consensus, constituting sixty highly conserved positions identified by a comparison of two evolutionarily-constrained OR repertoires (mouse and dog) with a small pseudogene fraction. We used a logistic regression analysis to assign appropriate coefficients to the conserved position and thus achieving maximal separation between active and inactive ORs. Consequently, the algorithms identified only 5% of the mouse functional ORs as pseudogenes, setting an upper limit of 0.05 to the false positive detection. Finally we used this algorithm to classify the 384 purportedly intact human OR genes. Of these, 135 were predicted as likely encoding non-functional proteins, and 38 were segregating between active and inactive forms due to missense polymorphisms.</p> <p>Conclusion</p> <p>We demonstrated that the CORP algorithm is capable to distinguish between functional and non-functional OR genes with high precision even when the encoded protein would differ by a single amino acid. Using the CORP algorithm, we predict that ~70% of human OR genes are likely non-functional pseudogenes, a much higher number than hitherto suspected. The method we present may be employed for better annotation of inactive members in other gene families as well.</p> <p>CORP algorithm is available at: <url>http://bioportal.weizmann.ac.il/HORDE/CORP/</url></p> |
url |
http://www.biomedcentral.com/1471-2105/7/393 |
work_keys_str_mv |
AT lancetdoron aprobabilisticclassifierforolfactoryreceptorpseudogenes AT alonironny aprobabilisticclassifierforolfactoryreceptorpseudogenes AT menasheidan aprobabilisticclassifierforolfactoryreceptorpseudogenes AT lancetdoron probabilisticclassifierforolfactoryreceptorpseudogenes AT alonironny probabilisticclassifierforolfactoryreceptorpseudogenes AT menasheidan probabilisticclassifierforolfactoryreceptorpseudogenes |
_version_ |
1716801457396645888 |