Summary: | 碩士 === 國立中山大學 === 資訊工程學系研究所 === 100 === Essential proteins affect the cellular life deeply, but it is hard to identify them. Protein-protein interaction is one of the ways to disclose whether a protein is essential or not. We notice that many researchers use the feature set composed of topology properties from protein-protein interaction to predict the essential proteins. However, the functionality of a protein is also a clue to determine its essentiality. In this thesis, to build SVM models for predicting the essential proteins, our feature set contains the sequence properties which can influence the protein function, topology properties and protein properties. In our experiments, we download Scere20070107, which contains 4873 proteins and 17166 interactions, from DIP database. The ratio of essential proteins to nonessential proteins is nearly 1:4, so it is imbalanced. In the imbalanced dataset, the best values of F-measure, MCC, AIC and BIC of our models are 0.5197, 0.4671, 0.2428 and 0.2543, respectively. We build another balanced dataset with ratio 1:1. For balanced dataset, the best values of F-measure, MCC, AIC and BIC of our models are 0.7742, 0.5484, 0.3603 and 0.3828, respectively. Our results are superior to all previous results with various measurements.
|