Sequence-based Comparative Analysis of Human Gene Loci

博士 === 國立陽明大學 === 生物醫學資訊研究所 === 99 === Elucidation of human gene functionality is a major fundamental issue on biomedical studies. It relies on well-annotated orthology to generalize the studies from model organisms to human. Following the termination of the Human Genome Project in 2003, ortholog da...

Full description

Bibliographic Details
Main Authors: Meng-Ru Ho, 何孟如
Other Authors: Wen-chang Lin
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/72803520153203778035
id ndltd-TW-099YM005114002
record_format oai_dc
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立陽明大學 === 生物醫學資訊研究所 === 99 === Elucidation of human gene functionality is a major fundamental issue on biomedical studies. It relies on well-annotated orthology to generalize the studies from model organisms to human. Following the termination of the Human Genome Project in 2003, ortholog databases have been systemically established via comparative genomic analysis of human genes. However, existing ortholog databases have been shown to contain ambiguous eukaryotic ortholog relationships and misclassifications between alternatively spliced protein isoforms and in-paralogs. To solve this problem, in this dissertation, we introduce a new methodology for designating eukaryotic orthology via processed transcription units (PTUs). The main concept of our approach is to employ genomic locations of transcripts to cluster alternative splicing (AS)-derived isoforms prior to the best reciprocal hit (BRH) identification of orthology. As a result, one gene will possess one PTU as the representative sequence unit except for overlapped or embedded genes. After performing BRH for PTUs, we additionally apply three syntenic fitting rules to avoid losing orthologs of the genes with unconserved overlapped/embedded structures and tandem inparalogs. By utilizing human/mouse as a prototype, we have demonstrated that more than 90% of identified orthologs are consistent with existing databases. In addition, the coverage of the delineated human/mouse orthologs is increased to 80% while public databases can merely carry out less than 66% of human/mouse orthologs. Based on our methodology, we have further constructed the Gene-Oriented Ortholog Database (GOOD, available at http://goods.ibms.sinica.edu.tw/GOODs/) for four well-annotated vertebrates, Homo sapiens, Pan troglodytes, Mus musculus, and Bos taurus. We present ortholgy based on the gene point of view. That is, a genomic region which can be transcribed to be a functional element in cells is defined as a gene. The orthologous genes of two species will infer their genomic regions inherited from the common ancestor but individual protein sequences. Therefore, two lists of annotated transcripts from two ortholgous genes would show the transcriptomic difference in the comparison. Moreover, GOOD comprises functional annotation from the Gene Ontology (GO) database. GO terms are listed and sustained by our depicted GO graphs which reveal hierarchical-like relationships among divergent functionalities. Because of more comprehensive orthologous presentation and the integration of functional annotation, GOOD can assist researchers in interpreting observed molecular functions in the model organism and generalizing the results to human. During the ortholog identification process, we have observed that there are well-conserved duplicated genes or newly duplicated ones widely distributed in the human genome. We utilize reference transcripts as the repeated units to annotate duplicated gene loci (DGL) which share more than 95% identical exonic sequences. This DGL annotation not only footnotes the indistinguishable orthologs of well-conserved duplicated genes (out-paralogs) and newly duplicated genes (in-paralogs) but also indicates gene duplicates which may disturb the sequence-based analysis of genetic studies with false positive SNP calls. For instance, a single-base difference in nearly identical duplicated segments may be misjudged as a single nucleotide polymorphism (SNP) from individuals. We have designated the nucleotide variations of DGL to construct the duplicated-gene nucleotide variant database (dbDNV, available at http://goods.ibms.sinica.edu.tw/DNVs/). Uniting DNV annotation and the SNP information from dbSNP, we believe that dbDNV can promote more accurate and informative SNP/mutation annotations for duplicated genes. With the advancement of high-throughput sequencing biotechnologies, it requires the information technology to assist systematic analysis and integration of various biological data into one. In this dissertation, we have illustrated that large-scale information analysis can resolve biological issues in ortholog identification and SNP detection through our sequence-based comparative analysis of human gene loci. We expect that the proposed databases, GOOD and dbDNV, would be helpful for researchers to make more accurate inference from the experimental results.
author2 Wen-chang Lin
author_facet Wen-chang Lin
Meng-Ru Ho
何孟如
author Meng-Ru Ho
何孟如
spellingShingle Meng-Ru Ho
何孟如
Sequence-based Comparative Analysis of Human Gene Loci
author_sort Meng-Ru Ho
title Sequence-based Comparative Analysis of Human Gene Loci
title_short Sequence-based Comparative Analysis of Human Gene Loci
title_full Sequence-based Comparative Analysis of Human Gene Loci
title_fullStr Sequence-based Comparative Analysis of Human Gene Loci
title_full_unstemmed Sequence-based Comparative Analysis of Human Gene Loci
title_sort sequence-based comparative analysis of human gene loci
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/72803520153203778035
work_keys_str_mv AT mengruho sequencebasedcomparativeanalysisofhumangeneloci
AT hémèngrú sequencebasedcomparativeanalysisofhumangeneloci
AT mengruho rénlèijīyīnwèidiǎnzhīxùlièjīchǔshàngdebǐjiàofēnxī
AT hémèngrú rénlèijīyīnwèidiǎnzhīxùlièjīchǔshàngdebǐjiàofēnxī
_version_ 1718049087443435520
spelling ndltd-TW-099YM0051140022015-10-13T20:37:07Z http://ndltd.ncl.edu.tw/handle/72803520153203778035 Sequence-based Comparative Analysis of Human Gene Loci 人類基因位點之序列基礎上的比較分析 Meng-Ru Ho 何孟如 博士 國立陽明大學 生物醫學資訊研究所 99 Elucidation of human gene functionality is a major fundamental issue on biomedical studies. It relies on well-annotated orthology to generalize the studies from model organisms to human. Following the termination of the Human Genome Project in 2003, ortholog databases have been systemically established via comparative genomic analysis of human genes. However, existing ortholog databases have been shown to contain ambiguous eukaryotic ortholog relationships and misclassifications between alternatively spliced protein isoforms and in-paralogs. To solve this problem, in this dissertation, we introduce a new methodology for designating eukaryotic orthology via processed transcription units (PTUs). The main concept of our approach is to employ genomic locations of transcripts to cluster alternative splicing (AS)-derived isoforms prior to the best reciprocal hit (BRH) identification of orthology. As a result, one gene will possess one PTU as the representative sequence unit except for overlapped or embedded genes. After performing BRH for PTUs, we additionally apply three syntenic fitting rules to avoid losing orthologs of the genes with unconserved overlapped/embedded structures and tandem inparalogs. By utilizing human/mouse as a prototype, we have demonstrated that more than 90% of identified orthologs are consistent with existing databases. In addition, the coverage of the delineated human/mouse orthologs is increased to 80% while public databases can merely carry out less than 66% of human/mouse orthologs. Based on our methodology, we have further constructed the Gene-Oriented Ortholog Database (GOOD, available at http://goods.ibms.sinica.edu.tw/GOODs/) for four well-annotated vertebrates, Homo sapiens, Pan troglodytes, Mus musculus, and Bos taurus. We present ortholgy based on the gene point of view. That is, a genomic region which can be transcribed to be a functional element in cells is defined as a gene. The orthologous genes of two species will infer their genomic regions inherited from the common ancestor but individual protein sequences. Therefore, two lists of annotated transcripts from two ortholgous genes would show the transcriptomic difference in the comparison. Moreover, GOOD comprises functional annotation from the Gene Ontology (GO) database. GO terms are listed and sustained by our depicted GO graphs which reveal hierarchical-like relationships among divergent functionalities. Because of more comprehensive orthologous presentation and the integration of functional annotation, GOOD can assist researchers in interpreting observed molecular functions in the model organism and generalizing the results to human. During the ortholog identification process, we have observed that there are well-conserved duplicated genes or newly duplicated ones widely distributed in the human genome. We utilize reference transcripts as the repeated units to annotate duplicated gene loci (DGL) which share more than 95% identical exonic sequences. This DGL annotation not only footnotes the indistinguishable orthologs of well-conserved duplicated genes (out-paralogs) and newly duplicated genes (in-paralogs) but also indicates gene duplicates which may disturb the sequence-based analysis of genetic studies with false positive SNP calls. For instance, a single-base difference in nearly identical duplicated segments may be misjudged as a single nucleotide polymorphism (SNP) from individuals. We have designated the nucleotide variations of DGL to construct the duplicated-gene nucleotide variant database (dbDNV, available at http://goods.ibms.sinica.edu.tw/DNVs/). Uniting DNV annotation and the SNP information from dbSNP, we believe that dbDNV can promote more accurate and informative SNP/mutation annotations for duplicated genes. With the advancement of high-throughput sequencing biotechnologies, it requires the information technology to assist systematic analysis and integration of various biological data into one. In this dissertation, we have illustrated that large-scale information analysis can resolve biological issues in ortholog identification and SNP detection through our sequence-based comparative analysis of human gene loci. We expect that the proposed databases, GOOD and dbDNV, would be helpful for researchers to make more accurate inference from the experimental results. Wen-chang Lin 林文昌 2011 學位論文 ; thesis 78 en_US