Summary: | 碩士 === 長庚大學 === 資訊管理研究所 === 93 === In the era of post-Human Genome Project, many of the researches are focusing on the discovery of association between genetic markers and clinical phenotypes, where finding effective treatments against diseases are becoming crucial and applicable goals. Expressed Sequence Tags (ESTs) are widely used for various sequences analysis (e.g. gene discovery, polymorphism analysis and gene prediction etc). Although ESTs have become a great sequences resource, they might contain sequencing errors due to technical reasons.
In this thesis, we implement a machine learning technique, Hidden Markov Models (HMMs), to identify uneven peak patterns in the electropherograms of ESTs from automatic sequencing machines, there the set of parameters used by the HMMs is trained and obtained by k-cross validation method with the Viterbi Path Counting algorithm. This automated system will be implemented in the recognition of erroneous regions and to capture additional information in the annotation of ESTs. We expect this additional annotation can assist biologists in the study of genomics.
|