Summary: | 碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === Tandem repeat structures are widely distributed among all classes of proteins. Various basic structural units of repetitive nature possess functional diversity and reflect important influences on protein interaction and biological responses for different organisms. One of the most common types of protein repeat structure is the α-solenoid tandem repeat, which possesses low sequence similarity between any two internal repeat units within a structure. Therefore, a successful segmentation and classification system for identifying α-solenoid repeats cannot be achieved mainly based on sequence alignment based approaches. For a comprehensive analysis on fundamental repeat unit segmentation, subclass identification, and functional annotation on such repeat structures, we have developed an automatic identification system according to geometrical characteristics and secondary structure information. Dihedral angles of Psi and Alpha were applied to define locations of candidate α helix elements, and the included angle between the vectors formulated by neighboring α helix element was calculated for constructing fundamental repeat units. Characteristics of length of helix elements, geometric curvatures, and relative position of neighboring repeat units were considered for classifying the subtypes of α-solenoid tandem repeats. To evaluate the performance of our developed prediction system, we employed three databases including 923 α-solenoid repeats collected in the RepeatsDB database, 905 α-solenoid repeats retrieved from CATH database, and 166 α-solenoid repeats collected from SMART/Pfam database. The results showed that our proposed system achieved a recall rate of 94.24%, precision rate 76.16%, specificity rate 99.76% and accuracy rate 99.71% for identifying α-solenoid repeats. Regarding internal repeat unit segmentation for identified repeats, the developed system achieved a recall rate of 94.20%, precision rate 94.66%, specificity rate 96.73% and accuracy rate 95.62%. For subtype classification, system could achieve a recall rate of 81.76%, precision rate 82.46%, specificity rate 96.06%, and accuracy rate 93.38%. This is the first comprehensive classification system for identifying four different subtypes of α-solenoid repeats, and including fundamental internal repeat segmentation and geometric annotation. The on-line recognition and friendly interface designed system could facilitate structural biologists for efficiently comparing common and unique features of different subtypes of α-solenoid tandem repeats, and it is beneficial for protein classification, annotation, and perhaps the biological experiments.
|