Adaptive Evolution of Long Non-Coding RNAs
Chimpanzee is the closest living species to modern humans. Although the differences in phenotype are striking between these two species, the difference in genomic sequences is surprisingly small. Species specific changes and positive selection have been mostly found in proteins, but ncRNAs are also...
Main Author: | |
---|---|
Other Authors: | |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa2-323898 https://ul.qucosa.de/id/qucosa%3A32389 https://ul.qucosa.de/api/qucosa%3A32389/attachment/ATT-0/ |
Summary: | Chimpanzee is the closest living species to modern humans. Although the differences in phenotype are striking between these two species, the difference in genomic sequences is surprisingly small. Species specific changes and positive selection have been mostly found in proteins, but ncRNAs are also involved, including the largely uncharacterized class of long ncRNAs (lncRNAs). A notable example is the Human Accelerated Region 1 (HAR1), the region in the human genome with the highest number of human specific substitutions: 18 in 118 nucleotides. HAR1 is located in a pair of overlapping lncRNAs that are expressed in a crucial period for brain development. Importantly, structural rather then sequence constraints lead to evolution of many ncRNAs. Different methods have been developed for detecting negative selection in ncRNA structures, but none thus far for positive selection.
This motivated us to develop a novel method: the SSS-test (Selection on the Secondary Structure test). This novel method uses an excess of structure changing changes as a means of identifying positive selection. This is done using reports from RNAsnp, a tool that quantifies the structural effect of SNPs on RNA structures, and by applying multiple correction on the observations to generate selection scores. Insertions and deletions (indels) are dealt with separately using rank statistics and a background model. The scores for SNPs and indels are combined to calculate a final selection score for each of the input sequences, indicating the type of selection. We benchmarked the SSS-test with biological and synthetic datasets, obtaining coherent signals. We then applied it to a lncRNA database and obtained a set of 110 human lncRNAs as candidates for having evolved under adaptive evolution in humans.
Although lncRNAs have poor sequence conservation, they have conserved splice sites, which provide ideal guides for orthology annotation. To provide an alternative method for assigning orthology for lncRNAs, we developed the 'buildOrthologs' tool. It uses as input a map of ortholog splice sites created by the SpliceMap tool and applies a greedy algorithm to reconstruct valid ortholog transcripts. We applied this novel approach to create a well-curated catalog of lncRNA orthologs for primate species.
Finally, to understand the structural evolution of ncRNAs in full detail, we added a temporal aspect to the analysis. What was the order of mutations of a structure since its origin? This is a combinatorial problem, in which the exact mutations between ancestral and extant sequences must be put in order. For this, we developed the 'mutationOrder' tool using dynamic programming. It calculates every possible order of mutations and assigns probabilities to every path. We applied this novel tool to HAR1 as a case study and saw that the co-optimal paths that are equally likely to have occured share qualitatively comparable features. In general, they lead to stabilization of the human structure since the ancestral. We propose that this stabilization was caused by adaptive evolution.
With the new methods we developed and our analysis of primate databases, we gained new knowledge about adaptive evolution of human lncRNAs. |
---|