Summary: | 碩士 === 國立交通大學 === 生物資訊及系統生物研究所 === 103 === World Health Organization (WHO) mentioned that early detection of cancer greatly increases the chances for successful treatment. Genetic testing in clinical diagnosis plays an integral role. In breast cancer, mammography, ultrasound, and magnetic resonance imaging (MRI) are considered as an effective strategy for early detection. Once judged suffering breast cancer, biopsy could determine whether the tumor is benign or malignant. Furthermore, Genetic testing as a prognostic molecular way determines breast cancer subtype that provides a chance to treatment.
In general, a biomarker should have the following properties: (1) Readily quantifiable in accessible biological samples, (2) Expression is consistent in the general population, and (3) Expression is significantly increased especially in the disease condition.
Recently, genome-wide expression profiling (e.g. microarray and NGS) holds tremendous promise for revealing the patterns of coordinately regulated genes for early detection of diseases. Currently, T-test is a commonly used method for identifying biomarkers from genome-wide expression gene profiling. T-test often identified numerous significant genes without biomarker properties.
Here, we propose a new method, called (Consensus Mutual Information, CoMI), for analyzing genome-wide expression profiling and discovering biomarkers fitting the biomarker properties. First, we keep high expression genes by filtering low expression genes for readily quantifiable. Based on the second and third biomarker properties, we have developed a new scoring function SCoMI to identify consistent expression genes and significant differential expressed genes between normal and disease state. The scoring function SCoMI consist of mutual information (SMI), entropy (Scon), and differential of group rank (Sdist). The mutual information and entropy are used to measure the differential and consistent expression genes, respectively. Sdist can evaluate the differential expression between normal and disease state.
We have evaluated our method and scoring function SCoMI to analyze genome-wide expression profiling and discover the biomarkers on two microarray data sets, breast cancer and Alzheimer's disease. For discovery of the selected genes using our method, we applied the enrichments of gene ontology terms (i.e., biological process and cellular component,) and gene clustering to check the biological meanings and biomarker properties of these genes. Experimental results indicate that these selected genes are highly correlated with breast cancer and Alzheimer's disease. In addition, we integrated T-test and our method on these two sets. In breast cancer dataset, we not only identified 115 genes which incriminate normal and tumor samples and but also identify basal-like type patients. Interestingly, we got the similar results from these 115 selected genes that were applied to an independent TCGA data set. Our method and scoring function SCoMI provide a useful method to discover a set of biomarkers for early detection of diseases.
|