Prediction for the Essential Protein with the Support Vector Machine

碩士 === 國立中山大學 === 資訊工程學系研究所 === 100 === Essential proteins affect the cellular life deeply, but it is hard to identify them. Protein-protein interaction is one of the ways to disclose whether a protein is essential or not. We notice that many researchers use the feature set composed of topology prop...

Full description

Bibliographic Details
Main Authors: Zih-Jie Yang, 楊子杰
Other Authors: Chang-Biau Yang
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/47690675654629975128
Description
Summary:碩士 === 國立中山大學 === 資訊工程學系研究所 === 100 === Essential proteins affect the cellular life deeply, but it is hard to identify them. Protein-protein interaction is one of the ways to disclose whether a protein is essential or not. We notice that many researchers use the feature set composed of topology properties from protein-protein interaction to predict the essential proteins. However, the functionality of a protein is also a clue to determine its essentiality. In this thesis, to build SVM models for predicting the essential proteins, our feature set contains the sequence properties which can influence the protein function, topology properties and protein properties. In our experiments, we download Scere20070107, which contains 4873 proteins and 17166 interactions, from DIP database. The ratio of essential proteins to nonessential proteins is nearly 1:4, so it is imbalanced. In the imbalanced dataset, the best values of F-measure, MCC, AIC and BIC of our models are 0.5197, 0.4671, 0.2428 and 0.2543, respectively. We build another balanced dataset with ratio 1:1. For balanced dataset, the best values of F-measure, MCC, AIC and BIC of our models are 0.7742, 0.5484, 0.3603 and 0.3828, respectively. Our results are superior to all previous results with various measurements.