Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases

碩士 === 國立臺灣科技大學 === 營建工程系 === 103 === Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data a...

Full description

Bibliographic Details
Main Authors:	Chia-Hui Wu, 吳家慧
Other Authors:	Min-Yuan Cheng
Format:	Others
Language:	zh-TW
Published:	2015
Online Access:	http://ndltd.ncl.edu.tw/handle/27603126761093639239

id	ndltd-TW-103NTUS5512062
record_format	oai_dc
spelling	ndltd-TW-103NTUS55120622017-01-07T04:08:46Z http://ndltd.ncl.edu.tw/handle/27603126761093639239 Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases 考量歷史案例機率分佈於解決不平衡資料之問題 Chia-Hui Wu 吳家慧碩士國立臺灣科技大學營建工程系 103 Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data and uniform data to do training and testing. How to effectively improve such problems is an important issue. Classification accuracy rate is currently for classification analysis techniques to assess the classification model is good or bad. In classification problems, imbalance data will cause biased in training, cause a very low classification accuracy of prediction of the MI type. This problem is due to imbalanced data, in such data like this, the number of MA samples far more than the number of MI samples in dataset. It will cause general classification analysis techniques have a serious problem of class prediction bias. Summarized above, this study will use the "probability distribution balanced data sampling method" to balance the data set, plus the classifier SOS-LSSVM, increase the prediction accuracy. And use the results to draw ROC curves and calculating the area under the curve (AUC) to evaluate the effectiveness of this resampling method, prove that the method of this study can effectively solve the problem of unbalanced data and improve forecast accuracy of artificial intelligence. Min-Yuan Cheng 鄭明淵 2015 學位論文 ; thesis 89 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣科技大學 === 營建工程系 === 103 === Problem of imbalanced dataset received much attention in recent years, when different type of training samples are unbalance, it will affect the classification accuracy of artificial intelligence, because artificial intelligence requires a large amount of data and uniform data to do training and testing. How to effectively improve such problems is an important issue. Classification accuracy rate is currently for classification analysis techniques to assess the classification model is good or bad. In classification problems, imbalance data will cause biased in training, cause a very low classification accuracy of prediction of the MI type. This problem is due to imbalanced data, in such data like this, the number of MA samples far more than the number of MI samples in dataset. It will cause general classification analysis techniques have a serious problem of class prediction bias. Summarized above, this study will use the "probability distribution balanced data sampling method" to balance the data set, plus the classifier SOS-LSSVM, increase the prediction accuracy. And use the results to draw ROC curves and calculating the area under the curve (AUC) to evaluate the effectiveness of this resampling method, prove that the method of this study can effectively solve the problem of unbalanced data and improve forecast accuracy of artificial intelligence.
author2	Min-Yuan Cheng
author_facet	Min-Yuan Cheng Chia-Hui Wu 吳家慧
author	Chia-Hui Wu 吳家慧
spellingShingle	Chia-Hui Wu 吳家慧 Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
author_sort	Chia-Hui Wu
title	Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
title_short	Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
title_full	Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
title_fullStr	Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
title_full_unstemmed	Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases
title_sort	solving unbalanced data by considering the probability distribution of historical cases
publishDate	2015
url	http://ndltd.ncl.edu.tw/handle/27603126761093639239
work_keys_str_mv	AT chiahuiwu solvingunbalanceddatabyconsideringtheprobabilitydistributionofhistoricalcases AT wújiāhuì solvingunbalanceddatabyconsideringtheprobabilitydistributionofhistoricalcases AT chiahuiwu kǎoliànglìshǐànlìjīlǜfēnbùyújiějuébùpínghéngzīliàozhīwèntí AT wújiāhuì kǎoliànglìshǐànlìjīlǜfēnbùyújiějuébùpínghéngzīliàozhīwèntí
_version_	1718407183424552960

Solving Unbalanced Data by Considering the Probability Distribution of Historical Cases

Similar Items