Summary: | Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Includes bibliographical references (leaves 81-86). === Markov models of sequence evolution are a fundamental building block for making inferences in biological research. This thesis reviews several major techniques developed to estimate parameters of Markov models of sequence evolution and presents a new approach for evaluating and comparing estimation techniques. Current methods for evaluating estimation techniques require sequence data from populations with well-known phylogenetic relationships. Such data is not always available since phylogenetic relationships can never be known with certainty. We propose generating sequence data for the purpose of estimation technique evaluation by simulating sequence evolution in a controlled setting. Our elementary simulator uses a Markov model and a binary branching process, which dynamically builds a phylogenetic tree from an initial seed sequence. The sequences at the leaves of the tree can then be used as input to estimation techniques. We demonstrate our evaluation approach on Arvestad and Bruno's estimation method, and show how our approach can reveal performance variations empirically. The results of our simulation can be used as a guide towards improving estimation techniques. === by Keyuan Xu. === M.Eng.
|