Survey on Deep Learning with Imbalanced Data Sets
碩士 === 國立政治大學 === 應用數學系 === 108 === This paper is a survey on deep learning with imbalanced data sets and anomaly detection. We create two imbalanced data sets from MNIST for multi-classification task with minority classes 0,1,4,6,7 and binary classification task with minority class 0. Our data set...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/b8r339 |
id |
ndltd-TW-108NCCU5507001 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-108NCCU55070012019-10-12T03:34:53Z http://ndltd.ncl.edu.tw/handle/b8r339 Survey on Deep Learning with Imbalanced Data Sets 深度學習在不平衡數據集之研究 Tsai, Cheng-Hsiao 蔡承孝 碩士 國立政治大學 應用數學系 108 This paper is a survey on deep learning with imbalanced data sets and anomaly detection. We create two imbalanced data sets from MNIST for multi-classification task with minority classes 0,1,4,6,7 and binary classification task with minority class 0. Our data sets are highly imbalanced with imbalanced rate ρ = 2500 and we use convolutional neural network(CNN) for training. In anomaly detection,we use the pretrained CNN handwriting classifier to decide the 18 cat and dog pictures are handwriting pictures or not. Due to the data set is imbalanced, the baseline model have poor performance on minority classes. Hence, we use 6 and 7 different methods to adjust our model. We find that the focal loss function and random over-sampling(ROS) have best performance on multi-classification task and binary classification task on our imbalanced data sets but the cost sensitive learning method is not suitable for our imbalanced data sets. By confidence estimation, our classifier successfully judge all the pictures of cat and dog are not handwriting picture. 蔡炎龍 2019 學位論文 ; thesis 168 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立政治大學 === 應用數學系 === 108 === This paper is a survey on deep learning with imbalanced data sets and anomaly detection. We create two imbalanced data sets from MNIST for multi-classification task with minority classes 0,1,4,6,7 and binary classification task with minority class 0. Our data sets are highly imbalanced with imbalanced rate ρ = 2500 and we use convolutional neural network(CNN) for training. In anomaly detection,we use the pretrained CNN handwriting classifier to decide the 18 cat and dog pictures are handwriting pictures or not.
Due to the data set is imbalanced, the baseline model have poor performance on minority classes. Hence, we use 6 and 7 different methods to adjust our model. We find that the focal loss function and random over-sampling(ROS) have best performance on multi-classification task and binary classification task on our imbalanced data sets but the cost sensitive learning method is not suitable for our imbalanced data sets. By confidence estimation, our classifier successfully judge all the pictures of cat and dog are not handwriting picture.
|
author2 |
蔡炎龍 |
author_facet |
蔡炎龍 Tsai, Cheng-Hsiao 蔡承孝 |
author |
Tsai, Cheng-Hsiao 蔡承孝 |
spellingShingle |
Tsai, Cheng-Hsiao 蔡承孝 Survey on Deep Learning with Imbalanced Data Sets |
author_sort |
Tsai, Cheng-Hsiao |
title |
Survey on Deep Learning with Imbalanced Data Sets |
title_short |
Survey on Deep Learning with Imbalanced Data Sets |
title_full |
Survey on Deep Learning with Imbalanced Data Sets |
title_fullStr |
Survey on Deep Learning with Imbalanced Data Sets |
title_full_unstemmed |
Survey on Deep Learning with Imbalanced Data Sets |
title_sort |
survey on deep learning with imbalanced data sets |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/b8r339 |
work_keys_str_mv |
AT tsaichenghsiao surveyondeeplearningwithimbalanceddatasets AT càichéngxiào surveyondeeplearningwithimbalanceddatasets AT tsaichenghsiao shēndùxuéxízàibùpínghéngshùjùjízhīyánjiū AT càichéngxiào shēndùxuéxízàibùpínghéngshùjùjízhīyánjiū |
_version_ |
1719263857802739712 |