Summary: | 碩士 === 國立清華大學 === 資訊工程學系 === 97 === Recent years, the DNA microarray technology has played a key role in research on molecular biology. As the increase of experiments on biological processes over time, analyzing statistical patterns from time-series data has become a crucial step for exploring the complex dynamics of biological systems. Due to the noise and measurements of uncertainty, the analysis task on time-series is more complicated than common data analysis. The early clustering methods such as k-means, Self-organizing Maps and hierarchical clustering neglect the temporal dependence between successive time points. The probabilistic model-based methods like dynamic Bayesian networks (DBN) and hidden Markov models (HMM) for clustering are more suitable for time-series but exist computation inefficiency. In this thesis, an unsupervised clustering algorithm which combines a recently proposed clustering scheme, Affinity Propagation, and the spirit of consensus clustering for multiple clustering partitions, is proposed. The proposed method investigates the relationship between genes across distinct time points through the interval selection from time points, and eliminates the influence of the noise and outliers. Our method produces a clustering result without a priori knowledge about the cluster number and exemplars, and demonstrate the significant clustering accuracy on the synthesis and real gene expression time-series datasets. Besides, the biological relevance of the clustering results is analyzed with the annotation of Gene Ontology, compared to early work. Our study provides the possible directions of clustering gene expression time-series data for future biological investigations.
|