Summary: | 碩士 === 華梵大學 === 資訊管理學系碩士班 === 98 === Recently data mining has been widely applied to medical diagnosis. But most medical databases are diverse, heterogeneous, and contain a large number of outliers in minority class. This situation affects the accuracy of follow-up data mining. Furthermore, it would lead to inadequate samples and affect the accuracy of following data classification if all records including outliers are choused to delete in the minority class of unbalanced database. Instead, it is the only way to readjust outliers and put records back to data mining.
Taking diabetes databases as an example of outliers included in minority class of imbalanced database, this study adopts LVF (Las Vegas Filter) to select related features affecting outlier and BPN (Back-Propagation Neural) to adjust outliers in order to improve the accuracy rate of following data classification.
Comparing traditional t test and X2 test, the result shows that LVF can select relevant attributes affecting data items containing outlier and thus improve accuracy rate of following data classification.
|