Summary: | 碩士 === 國立清華大學 === 資訊工程學系 === 101 === Gene prioritization is an important problem for bioinformatics and drug development in cancer. Several different computational approaches have been developed to solve the gene prioritization problem. They usually have their own advantages. The goal is to combine variant computational approaches by integrating each learning method to improve the performance for gene prioritization.
Due to the facts that score range of rank result is very narrow and we have no enough information for perfect prioritization, the rank scores generated for many genes using original prioritization methods tend to become the same. Hence, we add a weighted function to the algorithm and correct the parameters according to 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the relationship between all paired genes from each prioritization result to improve the result.
We use the prostate cancer data in OMIM database as training set and protein coding gene data in HGNC database as test set. Then we use ToppGene suite as the training tool. We adopt RankBoost algorithm as the ensemble learning for rank learning. The results show that the ranks of most training genes have been improved as we predicted by our experiment methods. The average precision and mean average precision as well as ROC curves and AUC of our methods are better than each of the original methods (ToppGene, K step Markov, Hits with Priors, PageRank with Priors) we employed. In conclusion, we have designed an efficient modified ensemble learning method to solve the gene prioritization problem and obtained good results from experiments.
|