Large-scale Orthology Detection

碩士 === 國立交通大學 === 生物資訊及系統生物研究所 === 100 === The rapid development of genome sequencing technology has resulted in an unprecedented growth in the number of the genome sequence data. However, the rate of the current biological experimental methods to identify gene function can’t catch up with the rate...

Full description

Bibliographic Details
Main Authors: Chung, Jen-Chun, 鐘仁駿
Other Authors: Lin, Tiao -Yin
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/99703120349032361040
Description
Summary:碩士 === 國立交通大學 === 生物資訊及系統生物研究所 === 100 === The rapid development of genome sequencing technology has resulted in an unprecedented growth in the number of the genome sequence data. However, the rate of the current biological experimental methods to identify gene function can’t catch up with the rate of today's high-throughput sequencing technology, leading to that the functions of the genes in many sequenced genomes are still unknown. It has been reported that the orthologous genes in different species should have the same function. Hence, the identification of orthologous genes is helpful to the prediction of gene functions in the sequenced genomes. Recently, a method, called QuartetS, has been proposed to perform large-scale orthology detection. The approach of QuartetS is first to find the paralogous genes, and then consider those genes that are not paralogous as orthologous genes. To determine whether two genes, say x and y, from two different species are paralogous, QuartetS first constructed a quartet gene tree using these two genes and other two paralogous genes, say z1 and z2 from the third species. QuartetS used an method to approximately determine the location of the root in the quartet gene tree. If the predicted root is located in the inner edge of the quartet tree, then x and y are considered as paralogous genes. Otherwise, QuartetS used other pairs of paralogous genes as z1 and z2 and repeated the above procedure. If all pre-prepared pairs of paralogous genes can’t be used to prove that x and y are paralogous, then x and y are considered as a pair of the orthologous genes. However, the shortcomings of QuartetS are that the mutation rate in species evolution is assumed to be constant, and the location of the root in the quartet tree is estimated using an approximate method. In this study, we make the following modifications to improve QuartetS: (i) The mutation rate of species is not assumed to be constant. (ii) The location of the root in the quartet gene tree is predicted by adding the fifth gene o that is a outgroup gene with respect to genes x, y, z1 and z2. Finally, experimental results have shown that the performance of our improved QuartetS method to distinguish paralogous genes from orthologous genes is indeed better than original QuartetS.