A Novel RFOS Method for Imbalanced Data Classification
碩士 === 輔仁大學 === 數學系碩士班 === 106 === According to the continuous development of Internet and the incremental data storages, methods of data analysis have been widely applied to several fields. Therefore, providing an efficient decision to classify or predict an issue based on mining the rule of data c...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/efu7nm |
id |
ndltd-TW-106FJU00479004 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106FJU004790042019-05-16T00:37:24Z http://ndltd.ncl.edu.tw/handle/efu7nm A Novel RFOS Method for Imbalanced Data Classification 針對不平衡資料分類之RFOS超抽樣法 PAI,TING-WEI 白庭瑋 碩士 輔仁大學 數學系碩士班 106 According to the continuous development of Internet and the incremental data storages, methods of data analysis have been widely applied to several fields. Therefore, providing an efficient decision to classify or predict an issue based on mining the rule of data classification plays an important role recently. However, since the classification data were considered in real-life, some of them would have problem with imbalanced data simultaneously. In other words, imbalanced data means the sample size of one class is smaller than other classes, the employed classification methods will tend to have higher error rate in this class. To solve this situation, lots of researches have provided different improvements. Furthermore, the most popular method is called SMOTE. Although SMOTE method can approximately solve the problem mentioned above, it expends another flaw which is unable to process mixed data efficiently. Hence, this research attempts to combine missing value imputation and random forest similarity matrix for gaining a higher and more balance accuracy. Consequently, 4 different data sets are selected to analysis with SMOTE method. Moreover, the results show that the classifications of minority and majority are more balance and have a higher accuracy in most situations. In conclusion, RFOS (Random Forest Over-Sampling) method, which is developed in this research, is an efficient way to solve unbalanced data. HUANG,HSIAO-YUN 黃孝雲 2018 學位論文 ; thesis 62 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 輔仁大學 === 數學系碩士班 === 106 === According to the continuous development of Internet and the incremental data storages, methods of data analysis have been widely applied to several fields. Therefore, providing an efficient decision to classify or predict an issue based on mining the rule of data classification plays an important role recently. However, since the classification data were considered in real-life, some of them would have problem with imbalanced data simultaneously. In other words, imbalanced data means the sample size of one class is smaller than other classes, the employed classification methods will tend to have higher error rate in this class. To solve this situation, lots of researches have provided different improvements. Furthermore, the most popular method is called SMOTE. Although SMOTE method can approximately solve the problem mentioned above, it expends another flaw which is unable to process mixed data efficiently. Hence, this research attempts to combine missing value imputation and random forest similarity matrix for gaining a higher and more balance accuracy. Consequently, 4 different data sets are selected to analysis with SMOTE method. Moreover, the results show that the classifications of minority and majority are more balance and have a higher accuracy in most situations. In conclusion, RFOS (Random Forest Over-Sampling) method, which is developed in this research, is an efficient way to solve unbalanced data.
|
author2 |
HUANG,HSIAO-YUN |
author_facet |
HUANG,HSIAO-YUN PAI,TING-WEI 白庭瑋 |
author |
PAI,TING-WEI 白庭瑋 |
spellingShingle |
PAI,TING-WEI 白庭瑋 A Novel RFOS Method for Imbalanced Data Classification |
author_sort |
PAI,TING-WEI |
title |
A Novel RFOS Method for Imbalanced Data Classification |
title_short |
A Novel RFOS Method for Imbalanced Data Classification |
title_full |
A Novel RFOS Method for Imbalanced Data Classification |
title_fullStr |
A Novel RFOS Method for Imbalanced Data Classification |
title_full_unstemmed |
A Novel RFOS Method for Imbalanced Data Classification |
title_sort |
novel rfos method for imbalanced data classification |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/efu7nm |
work_keys_str_mv |
AT paitingwei anovelrfosmethodforimbalanceddataclassification AT báitíngwěi anovelrfosmethodforimbalanceddataclassification AT paitingwei zhēnduìbùpínghéngzīliàofēnlèizhīrfoschāochōuyàngfǎ AT báitíngwěi zhēnduìbùpínghéngzīliàofēnlèizhīrfoschāochōuyàngfǎ AT paitingwei novelrfosmethodforimbalanceddataclassification AT báitíngwěi novelrfosmethodforimbalanceddataclassification |
_version_ |
1719168413749739520 |