Summary: | 碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 106 === Gene Ontology (GO) overrepresentation analysis is mainly applied to explain correlated behaviors of differentially expressed genes. In traditional approaches, differentially expressed gene cluster were analyzed if they could gather within a specific GO term, and hypergeometric distribution statistics were applied to calculate a corresponding p-value for each GO term. GO terms with a lower p-value is considered as more relevant to the biological experiment. The traditional analysis ignores some hidden interactions between genes, for example, long noncoding RNAs (lncRNAs) might regulate and inhibit their target genes and lead to reduce significance of some GO terms. There is another problem of inheritance attributes of GO hierarchical structure. Top-level GO terms belonging to general categories always possess lower p-values due to non-uniformity distributions between annotated genes and associated annotations. Therefore, we proposed adequate solutions to overcome these two problems and to increase effectiveness and accuracies of GO overrepresentation analysis. First, we assumed that differentially expressed long non-coding RNAs (lncRNAs) might regulate their neighboring genes through evaluating whether these lncRNAs overlapped with neighboring genes or transcription factor binding sites(TFBS) of neighboring genes. If these conditions appear, no matter the neighboring genes possessing differential expressions, they would be accounted for GO functional overrepresentation analysis. In addition, according to the GO hierarchical structure, a GO term with a significantly low p-value could be removed if the parent node possesses any child GO terms with significant p-values. In order to validate of the proposed system, we used two RNA-seq experiments of birc5a knock-down and birc5a knock-out in zebrafish embryogenesis. For the birc5a knock-down experiment, compared with the traditional GO overrepresentation analysis, 5 additional neuron development related GO terms and 4 additional calcium-channel related GO terms were discovered; for the birc5a knock-out experiment, 3 additional neuron development related GO terms and 3 more calcium related GO terms were identified. Several papers were published to validate it associations. To summarize it all, the proposed approaches provide an accurate functional annotation method for biological and medical researchers in their transcriptome-related experiments.
|