Summary: | 碩士 === 國立中央大學 === 資訊工程研究所 === 88 === Human Genome Project began at 1988 and then lots of genomes will be sequencialized later. Repeat sequences in genome sequences play an important role in medical diagnosis and research. The Transcription factor database TRANSFAC collects many promoter classes. In this thesis, we first mark the transcription factor binding sites in the repeat sequences and then apply data mining techniques to mine the association rules from the combinations of binding sites. We further prune the discovered associations to remove those insignificant associations and find a set of useful rules. Finally, we use the discovered association rules to partially classify the repeat sequences in our repeat database. We also experiment on several genomes including C.Elegans, Human Chromosome 22, and Yeast.
|