Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
碩士 === 國立臺灣科技大學 === 營建工程系 === 103 === Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data a...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2015
|
Online Access: | http://ndltd.ncl.edu.tw/handle/27603126761093639239 |
id |
ndltd-TW-103NTUS5512062 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-103NTUS55120622017-01-07T04:08:46Z http://ndltd.ncl.edu.tw/handle/27603126761093639239 Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases 考量歷史案例機率分佈於解決不平衡資料之問題 Chia-Hui Wu 吳家慧 碩士 國立臺灣科技大學 營建工程系 103 Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data and uniform data to do training and testing. How to effectively improve such problems is an important issue. Classification accuracy rate is currently for classification analysis techniques to assess the classification model is good or bad. In classification problems, imbalance data will cause biased in training, cause a very low classification accuracy of prediction of the MI type. This problem is due to imbalanced data, in such data like this, the number of MA samples far more than the number of MI samples in dataset. It will cause general classification analysis techniques have a serious problem of class prediction bias. Summarized above, this study will use the "probability distribution balanced data sampling method" to balance the data set, plus the classifier SOS-LSSVM, increase the prediction accuracy. And use the results to draw ROC curves and calculating the area under the curve (AUC) to evaluate the effectiveness of this resampling method, prove that the method of this study can effectively solve the problem of unbalanced data and improve forecast accuracy of artificial intelligence. Min-Yuan Cheng 鄭明淵 2015 學位論文 ; thesis 89 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 營建工程系 === 103 === Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data and uniform data to do training and testing. How to effectively improve such problems is an important issue.
Classification accuracy rate is currently for classification analysis techniques to assess the classification model is good or bad. In classification problems, imbalance data will cause biased in training, cause a very low classification accuracy of prediction of the MI type. This problem is due to imbalanced data, in such data like this, the number of MA samples far more than the number of MI samples in dataset. It will cause general classification analysis techniques have a serious problem of class prediction bias.
Summarized above, this study will use the "probability distribution balanced data sampling method" to balance the data set, plus the classifier SOS-LSSVM, increase the prediction accuracy. And use the results to draw ROC curves and calculating the area under the curve (AUC) to evaluate the effectiveness of this resampling method, prove that the method of this study can effectively solve the problem of unbalanced data and improve forecast accuracy of artificial intelligence.
|
author2 |
Min-Yuan Cheng |
author_facet |
Min-Yuan Cheng Chia-Hui Wu 吳家慧 |
author |
Chia-Hui Wu 吳家慧 |
spellingShingle |
Chia-Hui Wu 吳家慧 Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases |
author_sort |
Chia-Hui Wu |
title |
Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases |
title_short |
Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases |
title_full |
Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases |
title_fullStr |
Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases |
title_full_unstemmed |
Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases |
title_sort |
solving unbalanced data by considering the probability distribution of historical cases |
publishDate |
2015 |
url |
http://ndltd.ncl.edu.tw/handle/27603126761093639239 |
work_keys_str_mv |
AT chiahuiwu solvingunbalanceddatabyconsideringtheprobabilitydistributionofhistoricalcases AT wújiāhuì solvingunbalanceddatabyconsideringtheprobabilitydistributionofhistoricalcases AT chiahuiwu kǎoliànglìshǐànlìjīlǜfēnbùyújiějuébùpínghéngzīliàozhīwèntí AT wújiāhuì kǎoliànglìshǐànlìjīlǜfēnbùyújiějuébùpínghéngzīliàozhīwèntí |
_version_ |
1718407183424552960 |