Summary: | RNA splicing has enabled a dramatic increase in species complexity. Splicing occurs in over 95% of mam- malian genes allowing the development of exceptional cellular diversity without an increase in raw gene numbers. This is highlighted by the fact that human and nematodes have the same number of genes (20,000 human genes versus 19,000 genes in Caenorhabditis elegans). Although the mechanistic process of splicing is now well understood there remains a multitude of unexplored dynamics that have only become visible with the power of next generation sequencing (NGS). The human brain is one of the best examples of an intricate cellular structure. Neuronal cell types are incredibly diverse and specialised, regulated through various transcriptional mechanisms. Recently, long genes (150kb+) have been implicated as crucial to neuronal function and their impairment has been attributed to several neurological disorders. I explore this relationship further by showing that long genes are more highly expressed in the brain than other tissues. Long genes are also distinct in that they are deficient in H3k36me3, a histone mark largely associated with splicing and active transcription. Through analysis of brain RNA-seq data, a novel splicing mechanism known as recursive splicing was identified in long introns. Recursive splice sites (RSS) consist of an intronic 3’splice site followed immediately by a 5’ splice site. These sites result in a zero-length exon that regulates the use of cryptic promoters ensuring only the functional isoform is expressed. This discovery lead me to question if other non-canonical forms of splicing are common in the brain. Backsplicing is a recently discovered splicing mechanism pervasive in the tree of life. This occurs when a 3’ end of a downstream exon is spliced onto the 5’ end of an upstream exon resulting in a circular RNA molecule (hereafter: circRNA). circRNA are enriched in neuronal genes and mediated by RNA binding factors. I have identified and quantified the presence of circRNA within the brain, identifying a large number of highly expressed novel circRNA. From these findings I identify a subset of highly expressed backsplice junctions that occur between two proximal genes from the same family. vii In order to understand the function of these splicing reactions I inspected the splicing features themselves, namely; the 5’ and 3’ splice sites and the branchpoint. The branchpoint remains a poorly char- acterised feature and until recently very few have been experimentally validated. I explore these features through the ExAC and UCLex consortia, using cumulative variant ratios to annotate invariant positions within the branchpoint and splice sites. By identifying invariant positions I could then investigate how vari- ation impacts splicing efficiency by integrating whole exome and RNA sequence data from the GEUVADIS consortium. Findings show that exon expression is a poor indicator of splicing dysfunction, showing a three fold lower sensitivity than direct analysis of splice junction reads. I also devise a variant effect score that captures a significant portion of change in splice site efficiency enabling improved prediction of deleterious variants. Together, this thesis hints at the massive potential of NGS to investigate the diversity of splicing related features while identifying novel features that could be implicated in neurological dysfunction.
|