Orphan Genes Bioinformatics : Identification and properties of de novo created genes

Even today, many genes are without any known homolog. These "orphans" are found in all species, from Viruses to Prokaryotes and Eukaryotes. For a portion of these genes, we might simply not have enough data to find homologs yet. Some of them are imported from taxonomically distant organism...

Full description

Bibliographic Details
Main Author: Basile, Walter
Format: Doctoral Thesis
Language:English
Published: Stockholms universitet, Institutionen för biokemi och biofysik 2017
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-149168
http://nbn-resolving.de/urn:isbn:978-91-7797-085-9
http://nbn-resolving.de/urn:isbn:978-91-7797-086-6
Description
Summary:Even today, many genes are without any known homolog. These "orphans" are found in all species, from Viruses to Prokaryotes and Eukaryotes. For a portion of these genes, we might simply not have enough data to find homologs yet. Some of them are imported from taxonomically distant organisms via lateral transfer; others have homologs, but mutated beyond the point of recognition. However, a sizeable fraction of orphan genes is unambiguously created via "de novo" mechanisms. The study of such novel genes can contribute to our understanding of the emergence of functional novelty and the adaptation of species to new ecological niches. In this work, we first survey the field of orphan studies, and illustrate some of the common issues. Next, we analyze some of the intrinsic properties of orphans proteins, including secondary structure elements and Intrinsic Structural Disorder; specifically, we observe that in young proteins the relationship between these properties and the G+C content of their coding sequence is stronger than in older proteins. We then tackle some of the methodological problems often found in orphan studies. We find that using evolutionarily close species, and sensitive, state-of-the art homology recognition methods is instrumental to the identification of a set of orphans enriched in de novo created ones. Finally, we compare how intrinsic disorder is distributed in bacteria versus eukaryota. Eukaryotic proteins are longer and more disordered; the difference is to be attributed primarily to eukaryotic-specific domains and linker regions. In these sections of the proteins, a higher frequency of the disorder-promoting amino acid Serine can be observed in Eukaryotes. === <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Submitted. Paper 4: Manuscript.</p>