Summary: | 碩士 === 華梵大學 === 資訊管理學系碩士班 === 98 === Data Mining is now widespread used for many enterprises. There could be missing data from paperwork to electronic system because of human error or out–of–date information. Usually these data might be deleted or using average value, 0 and mode value to fill the missing values, but this can only applicable for fewer data. It will certainly affect the accuracy of data and ultimately unable to provide reliable information to the user.
This thesis use open datasets in the test. It use some data with missing values at random from the open datasets, then use average value, 0, Back–propagation Network (BPN) and Support Vector Regression (SVR) to analyze numerical backfill. Finally this thesis use regression tree to analyze the comparisons. The result shows that anticipation value by using SVR has the closest average error to the original value for missing value.
|