Identifying and Quantifying Orphan Protein Sequences in Fungi

For large regions of many proteins, and even entire proteins, no homology to known domains or proteins can be detected. These sequences are often referred to as orphans. Surprisingly, it has been reported that the large number of orphans is sustained in spite of a rapid increase of available genomic...

Full description

Bibliographic Details
Main Authors: Ekman, Diana, Elofsson, Arne
Format: Others
Language:English
Published: Stockholms universitet, Institutionen för biokemi och biofysik 2010
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-49277
id ndltd-UPSALLA1-oai-DiVA.org-su-49277
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-su-492772013-05-15T03:55:42ZIdentifying and Quantifying Orphan Protein Sequences in FungiengEkman, DianaElofsson, ArneStockholms universitet, Institutionen för biokemi och biofysikStockholms universitet, Institutionen för biokemi och biofysik2010evolutionprotein domainorphan proteinfungiNATURAL SCIENCESNATURVETENSKAPFor large regions of many proteins, and even entire proteins, no homology to known domains or proteins can be detected. These sequences are often referred to as orphans. Surprisingly, it has been reported that the large number of orphans is sustained in spite of a rapid increase of available genomic sequences. However, it is believed that de novo creation of coding sequences is rare in comparison to mechanisms such as domain shuffling and gene duplication; hence, most sequences should have homologs in other genomes. To investigate this, the sequences of 19 complete fungi genomes were compared. By using the phylogenetic relationship between these genomes, we could identify potentially de novo created orphans in Saccharomyces cerevisiae. We found that only a small fraction, &lt;2%, of the S. cerevisiae proteome is orphan, which confirms that de novo creation of coding sequences is indeed rare. Furthermore, we found it necessary to compare the most closely related species to distinguish between de novo created sequences and rapidly evolving sequences where homologs are present but cannot be detected. Next, the orphan proteins (OPs) and orphan domains (ODs) were characterized. First, it was observed that both OPs and ODs are short. In addition, at least some of the OPs have been shown to be functional in experimental assays, showing that they are not pseudogenes. Furthermore, in contrast to what has been reported before and what is seen for older orphans, S. cerevisiae specific ODs and proteins are not more disordered than other proteins. This might indicate that many of the older, and earlier classified, orphans indeed are fast-evolving sequences. Finally, &gt;90% of the detected ODs are located at the protein termini, which suggests that these orphans could have been created by mutations that have affected the start or stop codons. <p>authorCount :2</p>Article in journalinfo:eu-repo/semantics/articletexthttp://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-49277doi:10.1016/j.jmb.2009.11.053ISI:000274980400013Journal of Molecular Biology, 0022-2836, 2010, 396:2, s. 396-405application/pdfinfo:eu-repo/semantics/openAccessinfo:eu-repo/grantAgreement/EC/FP7/503567; 512092
collection NDLTD
language English
format Others
sources NDLTD
topic evolution
protein domain
orphan protein
fungi
NATURAL SCIENCES
NATURVETENSKAP
spellingShingle evolution
protein domain
orphan protein
fungi
NATURAL SCIENCES
NATURVETENSKAP
Ekman, Diana
Elofsson, Arne
Identifying and Quantifying Orphan Protein Sequences in Fungi
description For large regions of many proteins, and even entire proteins, no homology to known domains or proteins can be detected. These sequences are often referred to as orphans. Surprisingly, it has been reported that the large number of orphans is sustained in spite of a rapid increase of available genomic sequences. However, it is believed that de novo creation of coding sequences is rare in comparison to mechanisms such as domain shuffling and gene duplication; hence, most sequences should have homologs in other genomes. To investigate this, the sequences of 19 complete fungi genomes were compared. By using the phylogenetic relationship between these genomes, we could identify potentially de novo created orphans in Saccharomyces cerevisiae. We found that only a small fraction, &lt;2%, of the S. cerevisiae proteome is orphan, which confirms that de novo creation of coding sequences is indeed rare. Furthermore, we found it necessary to compare the most closely related species to distinguish between de novo created sequences and rapidly evolving sequences where homologs are present but cannot be detected. Next, the orphan proteins (OPs) and orphan domains (ODs) were characterized. First, it was observed that both OPs and ODs are short. In addition, at least some of the OPs have been shown to be functional in experimental assays, showing that they are not pseudogenes. Furthermore, in contrast to what has been reported before and what is seen for older orphans, S. cerevisiae specific ODs and proteins are not more disordered than other proteins. This might indicate that many of the older, and earlier classified, orphans indeed are fast-evolving sequences. Finally, &gt;90% of the detected ODs are located at the protein termini, which suggests that these orphans could have been created by mutations that have affected the start or stop codons. === <p>authorCount :2</p>
author Ekman, Diana
Elofsson, Arne
author_facet Ekman, Diana
Elofsson, Arne
author_sort Ekman, Diana
title Identifying and Quantifying Orphan Protein Sequences in Fungi
title_short Identifying and Quantifying Orphan Protein Sequences in Fungi
title_full Identifying and Quantifying Orphan Protein Sequences in Fungi
title_fullStr Identifying and Quantifying Orphan Protein Sequences in Fungi
title_full_unstemmed Identifying and Quantifying Orphan Protein Sequences in Fungi
title_sort identifying and quantifying orphan protein sequences in fungi
publisher Stockholms universitet, Institutionen för biokemi och biofysik
publishDate 2010
url http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-49277
work_keys_str_mv AT ekmandiana identifyingandquantifyingorphanproteinsequencesinfungi
AT elofssonarne identifyingandquantifyingorphanproteinsequencesinfungi
_version_ 1716585792785088512