Summary: | 碩士 === 國立臺南大學 === 資訊教育研究所碩士班 === 94 === With the growth of biological technology, enormous biological databases formed useful data warehouses, such as Microarray data, biomedical literatures, sequence data, and genome structure data et al. In recent years, a hot issue of bioinformatics is mining hidden and meaningful information from heterogeneous data. The goal of mining is to reach a higher accuracy than single dataset and predict the gene-gene relations and genetic networks in advance. Multi-Sources Clustering (MSC) is an important and representative approach for mining multi-sources; however, MSC does not consider the problem that genes may have multi-functions and involve several biological pathways. MSC also ignores that the properties and accuracy of heterogeneous data might be different. In this study, we propose the Multi-Source Soft Clustering (MSSC) by using fuzzy c-means and soft CAST to solve the problem. MSSC adopts the concept of clustering before integrating to improve the overall accuracy, and uses the correlation coefficient to calculate the distance between different soft clustering. Finally, as shown in the experiments, MSSC performs more accurately than MSC in both general and specific cases.
|