Summary: | 碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === The high-throughput sequencing technology provides an efficient and effective approach for discovering sequence contents and corresponding quantity of RNAs from a biological sample, and such an approach is called as RNA sequencing (RNA-seq). In recent years, RNA-seq related applications explore rapidly due to its high throughput mechanism and relatively fast experiment capability that brings an unprecedented development on gene functional annotation, gene regulation analysis, and environmental factorization verification. RNA-seq has been applied for various fields based on detection of differential gene expression analysis, however, with the increasing amount of sequenced reads and reference model species, how to choose appropriate reference species for gene annotation has become a new challenge. Therefore, this study proposed a novel approach for finding the most effective reference model species through ultra-conserved orthologous genes (UCO) comparison among species. An online system of multiple species selection (MSS) for RNA-seq differential expression analysis was developed, and a total of 167 reference model species in eukaryotes were constructed and retrieved from the RefSeq, KEGG and UniProt online databases. The system is not only to provide selection of appropriate reference species through UCO and Taxonomy associations, but also allow users to perform differential expression analysis through gene ontology and biological pathway approaches for functional annotation. In this thesis, we verified the correlation of UCO gene distance matrices among species and evaluated the results by various reference species selection for RNA-seq datasets from a de novo organism. The results showed that through selecting multiple appropriate species could solve the problem of lacking annotated information and obtain more accurate results than single model reference species.
|