Sequential Optimization of Global Sequence Alignments Relative to Different Cost Functions

The purpose of this dissertation is to present a methodology to model global sequence alignment problem as directed acyclic graph which helps to extract all possible optimal alignments. Moreover, a mechanism to sequentially optimize sequence alignment problem relative to different cost functions is...

Full description

Bibliographic Details
Main Author: Odat, Enas M.
Other Authors: Moshkov, Mikhail
Language:en
Published: 2013
Online Access:http://hdl.handle.net/10754/271672
Description
Summary:The purpose of this dissertation is to present a methodology to model global sequence alignment problem as directed acyclic graph which helps to extract all possible optimal alignments. Moreover, a mechanism to sequentially optimize sequence alignment problem relative to different cost functions is suggested. Sequence alignment is mostly important in computational biology. It is used to find evolutionary relationships between biological sequences. There are many algo- rithms that have been developed to solve this problem. The most famous algorithms are Needleman-Wunsch and Smith-Waterman that are based on dynamic program- ming. In dynamic programming, problem is divided into a set of overlapping sub- problems and then the solution of each subproblem is found. Finally, the solutions to these subproblems are combined into a final solution. In this thesis it has been proved that for two sequences of length m and n over a fixed alphabet, the suggested optimization procedure requires O(mn) arithmetic operations per cost function on a single processor machine. The algorithm has been simulated using C#.Net programming language and a number of experiments have been done to verify the proved statements. The results of these experiments show that the number of optimal alignments is reduced after each step of optimization. Furthermore, it has been verified that as the sequence length increased linearly then the number of optimal alignments increased exponentially which also depends on the cost function that is used. Finally, the number of executed operations increases polynomially as the sequence length increase linearly.