Sequence-based Comparative Analysis of Human Gene Loci
博士 === 國立陽明大學 === 生物醫學資訊研究所 === 99 === Elucidation of human gene functionality is a major fundamental issue on biomedical studies. It relies on well-annotated orthology to generalize the studies from model organisms to human. Following the termination of the Human Genome Project in 2003, ortholog da...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/72803520153203778035 |
id |
ndltd-TW-099YM005114002 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立陽明大學 === 生物醫學資訊研究所 === 99 === Elucidation of human gene functionality is a major fundamental issue on biomedical studies. It relies on well-annotated orthology to generalize the studies from model organisms to human. Following the termination of the Human Genome Project in 2003, ortholog databases have been systemically established via comparative genomic analysis of human genes. However, existing ortholog databases have been shown to contain ambiguous eukaryotic ortholog relationships and misclassifications between alternatively spliced protein isoforms and in-paralogs. To solve this problem, in this dissertation, we introduce a new methodology for designating eukaryotic orthology via processed transcription units (PTUs). The main concept of our approach is to employ genomic locations of transcripts to cluster alternative splicing (AS)-derived isoforms prior to the best reciprocal hit (BRH) identification of orthology. As a result, one gene will possess one PTU as the representative sequence unit except for overlapped or embedded genes. After performing BRH for PTUs, we additionally apply three syntenic fitting rules to avoid losing orthologs of the genes with unconserved overlapped/embedded structures and tandem inparalogs. By utilizing human/mouse as a prototype, we have demonstrated that more than 90% of identified orthologs are consistent with existing databases. In addition, the coverage of the delineated human/mouse orthologs is increased to 80% while public databases can merely carry out less than 66% of human/mouse orthologs.
Based on our methodology, we have further constructed the Gene-Oriented Ortholog Database (GOOD, available at http://goods.ibms.sinica.edu.tw/GOODs/) for four well-annotated vertebrates, Homo sapiens, Pan troglodytes, Mus musculus, and Bos taurus. We present ortholgy based on the gene point of view. That is, a genomic region which can be transcribed to be a functional element in cells is defined as a gene. The orthologous genes of two species will infer their genomic regions inherited from the common ancestor but individual protein sequences. Therefore, two lists of annotated transcripts from two ortholgous genes would show the transcriptomic difference in the comparison. Moreover, GOOD comprises functional annotation from the Gene Ontology (GO) database. GO terms are listed and sustained by our depicted GO graphs which reveal hierarchical-like relationships among divergent functionalities. Because of more comprehensive orthologous presentation and the integration of functional annotation, GOOD can assist researchers in interpreting observed molecular functions in the model organism and generalizing the results to human.
During the ortholog identification process, we have observed that there are well-conserved duplicated genes or newly duplicated ones widely distributed in the human genome. We utilize reference transcripts as the repeated units to annotate duplicated gene loci (DGL) which share more than 95% identical exonic sequences. This DGL annotation not only footnotes the indistinguishable orthologs of well-conserved duplicated genes (out-paralogs) and newly duplicated genes (in-paralogs) but also indicates gene duplicates which may disturb the sequence-based analysis of genetic studies with false positive SNP calls. For instance, a single-base difference in nearly identical duplicated segments may be misjudged as a single nucleotide polymorphism (SNP) from individuals. We have designated the nucleotide variations of DGL to construct the duplicated-gene nucleotide variant database (dbDNV, available at http://goods.ibms.sinica.edu.tw/DNVs/). Uniting DNV annotation and the SNP information from dbSNP, we believe that dbDNV can promote more accurate and informative SNP/mutation annotations for duplicated genes.
With the advancement of high-throughput sequencing biotechnologies, it requires the information technology to assist systematic analysis and integration of various biological data into one. In this dissertation, we have illustrated that large-scale information analysis can resolve biological issues in ortholog identification and SNP detection through our sequence-based comparative analysis of human gene loci. We expect that the proposed databases, GOOD and dbDNV, would be helpful for researchers to make more accurate inference from the experimental results.
|
author2 |
Wen-chang Lin |
author_facet |
Wen-chang Lin Meng-Ru Ho 何孟如 |
author |
Meng-Ru Ho 何孟如 |
spellingShingle |
Meng-Ru Ho 何孟如 Sequence-based Comparative Analysis of Human Gene Loci |
author_sort |
Meng-Ru Ho |
title |
Sequence-based Comparative Analysis of Human Gene Loci |
title_short |
Sequence-based Comparative Analysis of Human Gene Loci |
title_full |
Sequence-based Comparative Analysis of Human Gene Loci |
title_fullStr |
Sequence-based Comparative Analysis of Human Gene Loci |
title_full_unstemmed |
Sequence-based Comparative Analysis of Human Gene Loci |
title_sort |
sequence-based comparative analysis of human gene loci |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/72803520153203778035 |
work_keys_str_mv |
AT mengruho sequencebasedcomparativeanalysisofhumangeneloci AT hémèngrú sequencebasedcomparativeanalysisofhumangeneloci AT mengruho rénlèijīyīnwèidiǎnzhīxùlièjīchǔshàngdebǐjiàofēnxī AT hémèngrú rénlèijīyīnwèidiǎnzhīxùlièjīchǔshàngdebǐjiàofēnxī |
_version_ |
1718049087443435520 |
spelling |
ndltd-TW-099YM0051140022015-10-13T20:37:07Z http://ndltd.ncl.edu.tw/handle/72803520153203778035 Sequence-based Comparative Analysis of Human Gene Loci 人類基因位點之序列基礎上的比較分析 Meng-Ru Ho 何孟如 博士 國立陽明大學 生物醫學資訊研究所 99 Elucidation of human gene functionality is a major fundamental issue on biomedical studies. It relies on well-annotated orthology to generalize the studies from model organisms to human. Following the termination of the Human Genome Project in 2003, ortholog databases have been systemically established via comparative genomic analysis of human genes. However, existing ortholog databases have been shown to contain ambiguous eukaryotic ortholog relationships and misclassifications between alternatively spliced protein isoforms and in-paralogs. To solve this problem, in this dissertation, we introduce a new methodology for designating eukaryotic orthology via processed transcription units (PTUs). The main concept of our approach is to employ genomic locations of transcripts to cluster alternative splicing (AS)-derived isoforms prior to the best reciprocal hit (BRH) identification of orthology. As a result, one gene will possess one PTU as the representative sequence unit except for overlapped or embedded genes. After performing BRH for PTUs, we additionally apply three syntenic fitting rules to avoid losing orthologs of the genes with unconserved overlapped/embedded structures and tandem inparalogs. By utilizing human/mouse as a prototype, we have demonstrated that more than 90% of identified orthologs are consistent with existing databases. In addition, the coverage of the delineated human/mouse orthologs is increased to 80% while public databases can merely carry out less than 66% of human/mouse orthologs. Based on our methodology, we have further constructed the Gene-Oriented Ortholog Database (GOOD, available at http://goods.ibms.sinica.edu.tw/GOODs/) for four well-annotated vertebrates, Homo sapiens, Pan troglodytes, Mus musculus, and Bos taurus. We present ortholgy based on the gene point of view. That is, a genomic region which can be transcribed to be a functional element in cells is defined as a gene. The orthologous genes of two species will infer their genomic regions inherited from the common ancestor but individual protein sequences. Therefore, two lists of annotated transcripts from two ortholgous genes would show the transcriptomic difference in the comparison. Moreover, GOOD comprises functional annotation from the Gene Ontology (GO) database. GO terms are listed and sustained by our depicted GO graphs which reveal hierarchical-like relationships among divergent functionalities. Because of more comprehensive orthologous presentation and the integration of functional annotation, GOOD can assist researchers in interpreting observed molecular functions in the model organism and generalizing the results to human. During the ortholog identification process, we have observed that there are well-conserved duplicated genes or newly duplicated ones widely distributed in the human genome. We utilize reference transcripts as the repeated units to annotate duplicated gene loci (DGL) which share more than 95% identical exonic sequences. This DGL annotation not only footnotes the indistinguishable orthologs of well-conserved duplicated genes (out-paralogs) and newly duplicated genes (in-paralogs) but also indicates gene duplicates which may disturb the sequence-based analysis of genetic studies with false positive SNP calls. For instance, a single-base difference in nearly identical duplicated segments may be misjudged as a single nucleotide polymorphism (SNP) from individuals. We have designated the nucleotide variations of DGL to construct the duplicated-gene nucleotide variant database (dbDNV, available at http://goods.ibms.sinica.edu.tw/DNVs/). Uniting DNV annotation and the SNP information from dbSNP, we believe that dbDNV can promote more accurate and informative SNP/mutation annotations for duplicated genes. With the advancement of high-throughput sequencing biotechnologies, it requires the information technology to assist systematic analysis and integration of various biological data into one. In this dissertation, we have illustrated that large-scale information analysis can resolve biological issues in ortholog identification and SNP detection through our sequence-based comparative analysis of human gene loci. We expect that the proposed databases, GOOD and dbDNV, would be helpful for researchers to make more accurate inference from the experimental results. Wen-chang Lin 林文昌 2011 學位論文 ; thesis 78 en_US |