A Novel RFOS Method for Imbalanced Data Classification

碩士 === 輔仁大學 === 數學系碩士班 === 106 === According to the continuous development of Internet and the incremental data storages, methods of data analysis have been widely applied to several fields. Therefore, providing an efficient decision to classify or predict an issue based on mining the rule of data c...

Full description

Bibliographic Details
Main Authors: PAI,TING-WEI, 白庭瑋
Other Authors: HUANG,HSIAO-YUN
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/efu7nm
id ndltd-TW-106FJU00479004
record_format oai_dc
spelling ndltd-TW-106FJU004790042019-05-16T00:37:24Z http://ndltd.ncl.edu.tw/handle/efu7nm A Novel RFOS Method for Imbalanced Data Classification 針對不平衡資料分類之RFOS超抽樣法 PAI,TING-WEI 白庭瑋 碩士 輔仁大學 數學系碩士班 106 According to the continuous development of Internet and the incremental data storages, methods of data analysis have been widely applied to several fields. Therefore, providing an efficient decision to classify or predict an issue based on mining the rule of data classification plays an important role recently. However, since the classification data were considered in real-life, some of them would have problem with imbalanced data simultaneously. In other words, imbalanced data means the sample size of one class is smaller than other classes, the employed classification methods will tend to have higher error rate in this class. To solve this situation, lots of researches have provided different improvements. Furthermore, the most popular method is called SMOTE. Although SMOTE method can approximately solve the problem mentioned above, it expends another flaw which is unable to process mixed data efficiently. Hence, this research attempts to combine missing value imputation and random forest similarity matrix for gaining a higher and more balance accuracy. Consequently, 4 different data sets are selected to analysis with SMOTE method. Moreover, the results show that the classifications of minority and majority are more balance and have a higher accuracy in most situations. In conclusion, RFOS (Random Forest Over-Sampling) method, which is developed in this research, is an efficient way to solve unbalanced data. HUANG,HSIAO-YUN 黃孝雲 2018 學位論文 ; thesis 62 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 輔仁大學 === 數學系碩士班 === 106 === According to the continuous development of Internet and the incremental data storages, methods of data analysis have been widely applied to several fields. Therefore, providing an efficient decision to classify or predict an issue based on mining the rule of data classification plays an important role recently. However, since the classification data were considered in real-life, some of them would have problem with imbalanced data simultaneously. In other words, imbalanced data means the sample size of one class is smaller than other classes, the employed classification methods will tend to have higher error rate in this class. To solve this situation, lots of researches have provided different improvements. Furthermore, the most popular method is called SMOTE. Although SMOTE method can approximately solve the problem mentioned above, it expends another flaw which is unable to process mixed data efficiently. Hence, this research attempts to combine missing value imputation and random forest similarity matrix for gaining a higher and more balance accuracy. Consequently, 4 different data sets are selected to analysis with SMOTE method. Moreover, the results show that the classifications of minority and majority are more balance and have a higher accuracy in most situations. In conclusion, RFOS (Random Forest Over-Sampling) method, which is developed in this research, is an efficient way to solve unbalanced data.
author2 HUANG,HSIAO-YUN
author_facet HUANG,HSIAO-YUN
PAI,TING-WEI
白庭瑋
author PAI,TING-WEI
白庭瑋
spellingShingle PAI,TING-WEI
白庭瑋
A Novel RFOS Method for Imbalanced Data Classification
author_sort PAI,TING-WEI
title A Novel RFOS Method for Imbalanced Data Classification
title_short A Novel RFOS Method for Imbalanced Data Classification
title_full A Novel RFOS Method for Imbalanced Data Classification
title_fullStr A Novel RFOS Method for Imbalanced Data Classification
title_full_unstemmed A Novel RFOS Method for Imbalanced Data Classification
title_sort novel rfos method for imbalanced data classification
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/efu7nm
work_keys_str_mv AT paitingwei anovelrfosmethodforimbalanceddataclassification
AT báitíngwěi anovelrfosmethodforimbalanceddataclassification
AT paitingwei zhēnduìbùpínghéngzīliàofēnlèizhīrfoschāochōuyàngfǎ
AT báitíngwěi zhēnduìbùpínghéngzīliàofēnlèizhīrfoschāochōuyàngfǎ
AT paitingwei novelrfosmethodforimbalanceddataclassification
AT báitíngwěi novelrfosmethodforimbalanceddataclassification
_version_ 1719168413749739520