Fault-tolerant pattern mining of exon skipped sequences from alternative splicing database

碩士 === 國立政治大學 === 資訊科學學系 === 93 === Before RNA sequences are translated into proteins, eukaryotes may produce different functional proteins from the same RNA sequences. It is due to influence of environment, second structure, specific substring pattern, etc. This mechanism is named alternative splic...

Full description

Bibliographic Details
Main Authors: Peng, Sing-Long, 彭興龍
Other Authors: Shan, Man-Kwan
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/72646404773450399948
Description
Summary:碩士 === 國立政治大學 === 資訊科學學系 === 93 === Before RNA sequences are translated into proteins, eukaryotes may produce different functional proteins from the same RNA sequences. It is due to influence of environment, second structure, specific substring pattern, etc. This mechanism is named alternative splicing. At present, there are still not enough research to judge causes and critical information of alternative splicing. We try to develop suitable data mining technologies to analyze large number of RNA sequences, and find out possible patterns affecting alternative splicing. Basically, there are seven possible types of alternative splicing. We focus on “exon skipping” type. According to the analysis of exon skipping data, we propose two fault-tolerant data mining methods and procedures: “Full Sequence Pattern Mining (FSPM)” and “Inverted Repeat Pattern Mining (IRPM).” Full sequence pattern mining method can be applied to mine all fault-tolerant frequent substrings in the whole intron sequences, and then get consensus sequential patterns using ApproxMap method proposed by Kum[18]. Inverted repeat pattern mining method can be used to look for consenesus patterns with structure of inverted repeat. Because inverted repeat patterns are often appeared in biological sequences and such structural patterns may result in exon skipping. We could discover some important patterns by this method. Finally, we mined patterns from two alternative splicing databsets “Avatar-120” and “ISIS-54”by above two proposed methods. The support and average fault number of mined patterns were discussed. These patterns were also used global alignment method as compared with two patterns (C / G-rich) discovered by Miriami[24]. Novel patterns measured by discrimination were reported.