Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases

碩士 === 國立臺灣科技大學 === 營建工程系 === 103 === Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data a...

Full description

Bibliographic Details
Main Authors: Chia-Hui Wu, 吳家慧
Other Authors: Min-Yuan Cheng
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/27603126761093639239
id ndltd-TW-103NTUS5512062
record_format oai_dc
spelling ndltd-TW-103NTUS55120622017-01-07T04:08:46Z http://ndltd.ncl.edu.tw/handle/27603126761093639239 Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases 考量歷史案例機率分佈於解決不平衡資料之問題 Chia-Hui Wu 吳家慧 碩士 國立臺灣科技大學 營建工程系 103 Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data and uniform data to do training and testing. How to effectively improve such problems is an important issue. Classification accuracy rate is currently for classification analysis techniques to assess the classification model is good or bad. In classification problems, imbalance data will cause biased in training, cause a very low classification accuracy of prediction of the MI type. This problem is due to imbalanced data, in such data like this, the number of MA samples far more than the number of MI samples in dataset. It will cause general classification analysis techniques have a serious problem of class prediction bias. Summarized above, this study will use the "probability distribution balanced data sampling method" to balance the data set, plus the classifier SOS-LSSVM, increase the prediction accuracy. And use the results to draw ROC curves and calculating the area under the curve (AUC) to evaluate the effectiveness of this resampling method, prove that the method of this study can effectively solve the problem of unbalanced data and improve forecast accuracy of artificial intelligence. Min-Yuan Cheng 鄭明淵 2015 學位論文 ; thesis 89 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 營建工程系 === 103 === Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data and uniform data to do training and testing. How to effectively improve such problems is an important issue. Classification accuracy rate is currently for classification analysis techniques to assess the classification model is good or bad. In classification problems, imbalance data will cause biased in training, cause a very low classification accuracy of prediction of the MI type. This problem is due to imbalanced data, in such data like this, the number of MA samples far more than the number of MI samples in dataset. It will cause general classification analysis techniques have a serious problem of class prediction bias. Summarized above, this study will use the "probability distribution balanced data sampling method" to balance the data set, plus the classifier SOS-LSSVM, increase the prediction accuracy. And use the results to draw ROC curves and calculating the area under the curve (AUC) to evaluate the effectiveness of this resampling method, prove that the method of this study can effectively solve the problem of unbalanced data and improve forecast accuracy of artificial intelligence.
author2 Min-Yuan Cheng
author_facet Min-Yuan Cheng
Chia-Hui Wu
吳家慧
author Chia-Hui Wu
吳家慧
spellingShingle Chia-Hui Wu
吳家慧
Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
author_sort Chia-Hui Wu
title Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
title_short Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
title_full Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
title_fullStr Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
title_full_unstemmed Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
title_sort solving unbalanced data by considering the probability distribution of historical cases
publishDate 2015
url http://ndltd.ncl.edu.tw/handle/27603126761093639239
work_keys_str_mv AT chiahuiwu solvingunbalanceddatabyconsideringtheprobabilitydistributionofhistoricalcases
AT wújiāhuì solvingunbalanceddatabyconsideringtheprobabilitydistributionofhistoricalcases
AT chiahuiwu kǎoliànglìshǐànlìjīlǜfēnbùyújiějuébùpínghéngzīliàozhīwèntí
AT wújiāhuì kǎoliànglìshǐànlìjīlǜfēnbùyújiějuébùpínghéngzīliàozhīwèntí
_version_ 1718407183424552960