RAD and the demographic history of a hybrid zone : new insights into the evolution of hybrid sterility

Restriction Site associated DNA (RAD) is a molecular method involving restriction digestion and high throughput DNA sequencing. It promises the systematic subsampling of the genome and highly repeatable scoring of genetic variation in hundreds of individuals at current sequencing costs. However, it...

Full description

Bibliographic Details
Main Author: Kerth, Claudius
Other Authors: Butlin, Roger
Published: University of Sheffield 2018
Subjects:
570
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.745696
Description
Summary:Restriction Site associated DNA (RAD) is a molecular method involving restriction digestion and high throughput DNA sequencing. It promises the systematic subsampling of the genome and highly repeatable scoring of genetic variation in hundreds of individuals at current sequencing costs. However, it comes with its own problems. De novo assembly of RAD sequence data usually creates many putative reference tags that are only found in one or a few individuals leaving only relatively few markers for population genomic analyses. In the first chapter, I investigate three potential reasons for this outcome -- incomplete digestion, genomic religation and insufficient DNA template amount -- by looking at the occurrence of restriction enzyme recognition sequences within the resultant sequencing data of two different types of RAD libraries. Analysis of sequence clusters as well as the proportion of concordantly mapping read pairs against a Locusta reference sequence suggest that incomplete digestion has affected one of the restriction enzymes used and thereby the number of loci that could be sequenced at sufficient depth across individuals. The other restriction enzyme is found to be much less affected by incomplete digestion and instead random religation of restriction fragments indicates an inefficient adapter ligation step that also leads to low read depths across individuals. Finally, qPCR and read mapping against four newly reconstructed paired-end (PE) contig pair reference sequences suggests that low amount of starting DNA and/or high loss of DNA during the library preparation are a major cause for the locus drop-out observed in the de novo assembled read data. In the second part of this thesis, I use RAD sequence data to make inferences about several aspects of the demographic history of two grasshopper subspecies that form a hybrid zone in the Pyrenees between France and Spain. Sequence data was generated from 36 individuals sampled at the two opposite ends of a hybrid zone that is characterised by hybrid male sterility. I use a state-of-the-art de novo assembly strategy that utilises the shotgun-type PE reads from standard RAD to distinguish alleles from paralogs. I then conduct several population genomic analyses with the programme 'ANGSD' that incorporates uncertainty in genotypes by using genotype likelihoods instead of called genotypes. Results based on more than 1 million filtered sites confirm the high genetic differentiation of the two subspecies found in pre-genomic studies and a surprisingly high genetic diversity in the subspecies that is thought to be derived from a very distant glacial refuge. Further, demographic modelling with the programme δαδi reveals a robust signal of low but significant gene flow during the divergence of the two subspecies (Nm ≃ 0.471, until about 25 thousand years ago (kya)). Allowing for gene flow roughly doubles the divergence time estimate from about 0.5 to 1.1 million years ago (mya). The divergence time estimate without allowing for gene flow is highly consistent with previous estimates from a mitochondrial sequence marker. A history of divergence with gene flow also indicates that alleles causing Dobzhansky-Muller incompatibilitys (DMIs) are unlikely to have risen in frequency by genetic drift alone. The gene flow is clearly asymmetric between the two subspecies in line with many previous studies of the hybrid zone that indicated asymmetric introgression in the same direction. There is no signal of recent (postglacial) gene flow in the data set. However, this may well be due to a lack of power. Further analysis of this data set promises to yield more insights, e.g. loci potentially under divergent selection between the two subspecies.