Summary: | 碩士 === 國立交通大學 === 資訊科學系所 === 93 === The first step to know the function(s) of a protein is often to identify its subcellular location(s). Though scientists have been making efforts to identify the subcellular locations of proteins, an effective and efficient way to distinguish protein subcellular location(s) has yet to be completely achieved. Here, we introduce GAINER, a novel genetic algorithm based integrative for discovering protein subcellular localization signatures. GAINER encodes amino acid indices, alphabet indexing and approximate patterns as signatures candidates, and uses known subcellular location proteins as training data to mine discriminative signatures. Furthermore, we also developed a Bayesian based classifier, GALOP, to predict a protein’s subcellular location(s) based on the probabilities of the detected signatures on distinct subcellular locations. By comparing with the well-known tools TargetP and iPSORT, we show that GAINER can effectively and efficiently discover the protein subcellular localization signatures. In addition, we can know the biochemical meanings by inspecting these signatures, and help biologists to understand the protein subcellular sorting and targeting mechanisms. Finally, GALOP can annotate relevant databases accurately and thoroughly, which can greatly help biologists in proteomics research.
|