Data Mining for Regulatory Elements in Repeat Sequences

碩士 === 國立中央大學 === 資訊工程研究所 === 88 === Human Genome Project began at 1988 and then lots of genomes will be sequencialized later. Repeat sequences in genome sequences play an important role in medical diagnosis and research. The Transcription factor database TRANSFAC collects many promoter classes. In...

Full description

Bibliographic Details
Main Authors: Wen-Fu Cho, 卓文福
Other Authors: Jorng-Tzong Horng
Format: Others
Language:en_US
Published: 2000
Online Access:http://ndltd.ncl.edu.tw/handle/95026997305077670391
Description
Summary:碩士 === 國立中央大學 === 資訊工程研究所 === 88 === Human Genome Project began at 1988 and then lots of genomes will be sequencialized later. Repeat sequences in genome sequences play an important role in medical diagnosis and research. The Transcription factor database TRANSFAC collects many promoter classes. In this thesis, we first mark the transcription factor binding sites in the repeat sequences and then apply data mining techniques to mine the association rules from the combinations of binding sites. We further prune the discovered associations to remove those insignificant associations and find a set of useful rules. Finally, we use the discovered association rules to partially classify the repeat sequences in our repeat database. We also experiment on several genomes including C.Elegans, Human Chromosome 22, and Yeast.