Analysis of the next generation sequencing results: Discovery of disease-associated single nucleotide variants and fusion genes by high throughput analysis

碩士 === 國立陽明大學 === 生物醫學資訊研究所 === 99 === With the coming of post genomic era, numerous scientists around the world are studying the disease gene and genotype for each individual. The development of next generation sequencing technology not only has allowed researchers to get very high-throughput genot...

Full description

Bibliographic Details
Main Authors: Jui-Tse Hsu, 徐瑞澤
Other Authors: Ueng-Cheng Yang
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/30179719060150810849
Description
Summary:碩士 === 國立陽明大學 === 生物醫學資訊研究所 === 99 === With the coming of post genomic era, numerous scientists around the world are studying the disease gene and genotype for each individual. The development of next generation sequencing technology not only has allowed researchers to get very high-throughput genotype data in relatively short time, it also helps scientists reduce the time needed in conducting a research project. Although the data output rate is constantly improving, the integration of these genotype data with known experimental data and the use of these high-throughput sequencing data to find interesting biological phenomena rely on the development of adequate computational algorithms and analytical pipelines. In order to use these high-throughput sequencing data effectively, I have assembled two pipelines to process the next generation sequencing data, one for the DNA sequencing data, and the other for the RNA sequencing data. For the DNA sequencing data, this pipeline can link personal genomic sequence with public domain genotype data. It integrates Ensembl single nucleotide variation data, dbSNP data, reference genotype data into analysis. It also links OMIM (Online Mendelian Inheritance in Man) known disease data to known gene variations for each individual. For the RNA sequencing data, I have developed a fusion gene detector pipeline to discover abnormal chromosome and gene aberrations from RNA sequencing data. It can find gene rearrangement events such as chromosome insertion, deletion, inversion, duplication, translocation, intra-chromosome gene fusion events, and inter-chromosome fusion events. I use this pipeline to discover the relationships between gene fusion events and amino acid conformation change, such as the change of promoter of house keeping gene, the loss of apoptosis genes, the changes of reading frame. By applying some high-throuput sequencing data into our analytical pipeline, we have found tens of thousands of single nucleotide variations between each individual. It may explain the difference between each individual, and affect the susceptibility to certain diseases. We have use this pipeline to discover the complex gene variations in cancer genome, including chromosome rearrangements, single nucleotide variation, abnormal gene splicing and connecting patterns.