A Study of an Effect Soft Clustering Approach to Mining Gene Expressions from Multi-Source Databases

碩士 === 國立臺南大學 === 資訊教育研究所碩士班 === 94 === With the growth of biological technology, enormous biological databases formed useful data warehouses, such as Microarray data, biomedical literatures, sequence data, and genome structure data et al. In recent years, a hot issue of bioinformatics is mining hid...

Full description

Bibliographic Details
Main Authors: Hsiu-min Chuang, 莊秀敏
Other Authors: Chien-I Lee
Format: Others
Language:zh-TW
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/20295146071817100481
Description
Summary:碩士 === 國立臺南大學 === 資訊教育研究所碩士班 === 94 === With the growth of biological technology, enormous biological databases formed useful data warehouses, such as Microarray data, biomedical literatures, sequence data, and genome structure data et al. In recent years, a hot issue of bioinformatics is mining hidden and meaningful information from heterogeneous data. The goal of mining is to reach a higher accuracy than single dataset and predict the gene-gene relations and genetic networks in advance. Multi-Sources Clustering (MSC) is an important and representative approach for mining multi-sources; however, MSC does not consider the problem that genes may have multi-functions and involve several biological pathways. MSC also ignores that the properties and accuracy of heterogeneous data might be different. In this study, we propose the Multi-Source Soft Clustering (MSSC) by using fuzzy c-means and soft CAST to solve the problem. MSSC adopts the concept of clustering before integrating to improve the overall accuracy, and uses the correlation coefficient to calculate the distance between different soft clustering. Finally, as shown in the experiments, MSSC performs more accurately than MSC in both general and specific cases.