Exploiting cross-species conserved oligonucleotides to identify gene promoters in plants

碩士 === 元智大學 === 生物與醫學資訊碩士學位學程 === 99 === Promoter located in the DNA sequence, it can recruits RNA polymerase to bind the upstream of gene and start to transcribe. The promoter is different from sequence or structural even the nucleotides in the folding sequences, it produced energy is different to...

Full description

Bibliographic Details
Main Authors: Hsing-Juei Huang, 黃祥瑞
Other Authors: Ya-Ting Chao
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/88877225143749632122
Description
Summary:碩士 === 元智大學 === 生物與醫學資訊碩士學位學程 === 99 === Promoter located in the DNA sequence, it can recruits RNA polymerase to bind the upstream of gene and start to transcribe. The promoter is different from sequence or structural even the nucleotides in the folding sequences, it produced energy is different to the other sequences which is non promoter sequences in DNA. Correctly identify promoter location is become a important issue in genome biology. We focus on Arabidopsis Thaliana in this experiment, includes 33200 gene models, and construct experimental dataset after data filtered. And use orthologous sequences in Arabidopsis Thaliana and Oryza sativa to find conserved regions in evolution. We use 6,8,10-mer sequences in conserved regions and combine Match(motif finder) for training features. Finally, we use classifier QuickRBF and 5 fold cross-validation to construct the prediction model. In unbalanced dataset, we still can achieve 78.02% balance accuracy in independent data.