Improvement of Predicting Human Protein Subcellular Localization Through Integrated Machine Learning Methods

碩士 === 逢甲大學 === 資訊工程學系 === 106 === The prediction of protein subcellular locations is an important topic in computational biology research over the past decade. Knowing protein subcellular localization can understand protein function as well as protein-protein interactions. However, relying on exper...

Full description

Bibliographic Details
Main Authors: LIN,TSAI-YU, 林采妤
Other Authors: YU,CHIN-SHENG
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/5949gw
Description
Summary:碩士 === 逢甲大學 === 資訊工程學系 === 106 === The prediction of protein subcellular locations is an important topic in computational biology research over the past decade. Knowing protein subcellular localization can understand protein function as well as protein-protein interactions. However, relying on experimental methods to identify subcellular locations of proteins is often laborious and expensive, so when using large-scale protein datasets with unknown locations, it is highly desirable to use more efficient computer prediction tools. So far, many methods have been proposed to predict the location of large-scale protein datasets, and statistical machine learning methods have been widely used in model construction. The key step in these predictions is to encode the amino acid sequence as a feature vector. In this paper, we use protein sequences to calculate various n-peptide amino acid composition, and then characterize different n-peptide amino acid composition characteristics using a machine learning approach-Support Vector Machine(SVM) [1] combined genetic algorithm(GA). Then the genetic algorithm is used to select the features. Finally, the prediction results are evaluated by recall, precision and F1 and compared with the past methods. The results show that our method can achieve 64% of the overall F1 value. We use a simpler method to make predictions, we can get results that are about or better than other more complex methods.