Data Mining of Disease Susceptibility Genes Using Clustering Methods

碩士 === 輔仁大學 === 應用統計學研究所 === 95 === With the advent of modern rapid genome sequencing, data mining has become one of the most efficient instruments for searching the possible genes underlying the susceptibility to diseases in these days. To search for the disease susceptibility genes from thousands...

Full description

Bibliographic Details
Main Authors: Li-Shu Wang, 王麗淑
Other Authors: John Jen Tai
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/77987835778085854589
Description
Summary:碩士 === 輔仁大學 === 應用統計學研究所 === 95 === With the advent of modern rapid genome sequencing, data mining has become one of the most efficient instruments for searching the possible genes underlying the susceptibility to diseases in these days. To search for the disease susceptibility genes from thousands of available markers, clustering methods provides a time-saving way to make the search process feasible in practical analysis. In this thesis we propose a clustering method to classify the tested markers into two groups, the associated group and non-associated group. Members in the associated group are the markers that have higher association with the disease than those in non-associated group. P-values obtained from case-control data are used as the genetic distance for clustering process. Simulation studies were conducted to investigate the performance of our method. Our study is a preliminary investigation on the possibility of use of P-values as the genetic distance for clustering. The results showed that in some simulations (e.g., when linkage disequilibrium is high) our method can correctly identify the genes of diseases.