Application of Density Estimation Algorithms in Analysis of Large Medical Databases

博士 === 國立臺灣大學 === 資訊工程學研究所 === 100 === Recently the vision in biomedical informatics is toward automatically processing data and making intelligent decisions. In this thesis, we will propose a density estimation based data analysis procedure to investigate the co-morbid associations between migraine...

Full description

Bibliographic Details
Main Authors: Meng-Han Yang, 楊孟翰
Other Authors: 歐陽彥正
Format: Others
Language:en_US
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/40408014872129703819
Description
Summary:博士 === 國立臺灣大學 === 資訊工程學研究所 === 100 === Recently the vision in biomedical informatics is toward automatically processing data and making intelligent decisions. In this thesis, we will propose a density estimation based data analysis procedure to investigate the co-morbid associations between migraine and the suspected diseases. The primary objective of this study has aimed to develop a novel analysis procedure that can discover insightful knowledge from large medical databases. The entire analysis procedure consists of two stages. During the first stage, a kernel density estimation algorithm named RVKDE is invoked to identify the samples of interest. Then, in the second stage, a density estimation algorithm based on generalized Gaussian components and named G2DE is invoked to provide a summarized description of the distribution. Because migraine is a prevalent but constantly underestimated neurological disorder, we would like to mine its co-morbid associations with multiple psychiatric and somatic illnesses. The National Health Insurance Research Database (NHIRD) of Taiwan was utilized as the data source of this study, whose major strength is a large population-based medical claims database. The results obtained by applying the proposed two-staged procedure to analyze co-morbidities of migraine reveal that the proposed procedure can effectively identify a number of clusters of samples with distinctive characteristics. Furthermore, it has been observed that the distinctive characteristics of the clusters are in conformity with recently discovered knowledge in biomedical research. Accordingly, it is conceivable that the proposed analysis procedure will be exploited to provide valuable clues of pathogenesis and facilitate development of proper treatment strategies.