Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases

碩士 === 國立臺灣科技大學 === 營建工程系 === 103 === Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data a...

Full description

Bibliographic Details
Main Authors: Chia-Hui Wu, 吳家慧
Other Authors: Min-Yuan Cheng
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/27603126761093639239
Description
Summary:碩士 === 國立臺灣科技大學 === 營建工程系 === 103 === Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data and uniform data to do training and testing. How to effectively improve such problems is an important issue. Classification accuracy rate is currently for classification analysis techniques to assess the classification model is good or bad. In classification problems, imbalance data will cause biased in training, cause a very low classification accuracy of prediction of the MI type. This problem is due to imbalanced data, in such data like this, the number of MA samples far more than the number of MI samples in dataset. It will cause general classification analysis techniques have a serious problem of class prediction bias. Summarized above, this study will use the "probability distribution balanced data sampling method" to balance the data set, plus the classifier SOS-LSSVM, increase the prediction accuracy. And use the results to draw ROC curves and calculating the area under the curve (AUC) to evaluate the effectiveness of this resampling method, prove that the method of this study can effectively solve the problem of unbalanced data and improve forecast accuracy of artificial intelligence.