Summary: | 碩士 === 華梵大學 === 資訊管理學系碩士班 === 95 === Due to the vigorous development of bioinformatics technology, biochip is more matured and the production cost is decreased significantly. Thus, it is easy to access gene data. However, the data type of microarray is different from the usually statistical data. A typical microarray data of ovarian cancer consists of the expressions of tens of thousands of genes on a genomic scale. In this thesis, a proper approach with reasonable efficiency is developed to analyze the microarray data.
In this thesis, it begins with the regression analysis after getting the microarray data. The regression analysis is to select the target genes by picking the 200 genes with the highest or lowest residuals. For support vector machine (SVM) and genetic algorithm (GA), those target genes are furthermore selected as the disease-linked genes. Then several disease-linked genes are found according to various fitness values. Additionally, analysis of variance (ANOVA) is used to find the genes that have the ability to isolate these genes which relate to ovarian cancer. Then, fuzzy c-means (FCM) and hierarchical clustering are conducted to classify ovarian cancer. Finally the accuracy of classification is used to find the least disease-linked genes with the best performance. These obtained disease-linked genes can be used to classify ovarian cancer.
|