Summary: | The central task in phylogenetics is to infer the evolutionary relationships among a given set of species. These relationships are usually represented by a phylogenetic tree with the species of interest at the leaves and where the internal vertices of the tree represent ancestral species. The amount of available molecular data is increasing exponentially and, given the continual advances in sequencing techniques and throughput, this explosive growth will likely continue. These vast amounts of available data mean that biologists are able to assemble large multi-gene datasets for use in phylogenetic analyses, which presents distinct computational challenges. Supertree methods comprise one approach to reconstructing large phylogenies, given estimated trees for overlapping subsets of the entire set of taxa. These source trees are combined into a single supertree on the full set of taxa using various algorithmic techniques. When the data allow, the competing approach is a combined analysis (also known as a “super-matrix” or “total evidence” approach), whereby the different sequence data matrices for each of the different subsets of taxa are put into a single super-matrix, and a tree is estimated on that super-matrix. In this dissertation, I present simulation software I designed to allow users to compare the relative performance of different supertree methods, as well as that of combined analysis, on more realistic data and on a larger scale than has been used up to this point. I present an extensive simulation study that uses this software to compare the performance of supertree methods and combined analysis, and that demonstrates a need for more topologically accurate supertree methods. I also introduce a new supertree method that I have developed that outperforms the most commonly used, and what until now has arguably been the most accurate, supertree method. === text
|