Gene Ontology Terms-based Analysis of Gene Expression Time Series by Using Support Vector Machines

碩士 === 國立臺南大學 === 資訊工程學系碩士班 === 102 === With the advancements of high-throughput microarray technologies, we can acquire tremendous time-series gene expression data of a large number of genes over different time points in a short time. By examining the time-series gene expression data, we can find t...

Full description

Bibliographic Details
Main Authors: Pei-Lin Chen, 陳珮琳
Other Authors: Rong-Ming Chen
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/41473576621870952537
Description
Summary:碩士 === 國立臺南大學 === 資訊工程學系碩士班 === 102 === With the advancements of high-throughput microarray technologies, we can acquire tremendous time-series gene expression data of a large number of genes over different time points in a short time. By examining the time-series gene expression data, we can find the genes with similar expression profiles which tend to be associated with some biological processes or functions. This will help to predict the regulatory relationship between genes and can be applied to investigate the disease-related genes. Clustering is one of the primary tools for analyzing such data which partitions those genes with the same characteristics into the same clusters based on some prescribed features or measures. However, improving the performance of clustering algorithm can’t ensure that the functional annotations of the genes in the same cluster are similar or the same. This thesis addresses the problem of improving the biological relevance of time-series gene expression clustering such that the clustered genes are really well-distinguished in terms of their expression profiles and functional annotations, among all the genes. A gene ontology (GO) terms-based analysis of gene expression time series by using support vector machines (SVM) is proposed. Three experiments are performed to illustrate the effectiveness of the proposed approach and compared with two well-known traditional unsupervised clustering algorithms and one past method based on SVM with functional categories defined by Munich Information Center for Protein Sequences (MIPS) as the feature selection criteria. Experimental results show the superiority of the proposed method over the other methods stated above, indicating the effectiveness of using GO terms as the feature selection criteria in the analysis of gene expression time series by using SVM. Finally, we also have a preliminary analysis on the relationship between the GO term levels in the Directed Acyclic Graph (DAG) and the classification accuracy of corresponding SVM classifiers.