Survey on Deep Learning with Imbalanced Data Sets

碩士 === 國立政治大學 === 應用數學系 === 108 === This paper is a survey on deep learning with imbalanced data sets and anomaly detection. We create two imbalanced data sets from MNIST for multi­-classification task with minority classes 0,1,4,6,7 and binary classification task with minority class 0. Our data set...

Full description

Bibliographic Details
Main Authors: Tsai, Cheng-Hsiao, 蔡承孝
Other Authors: 蔡炎龍
Format: Others
Language:en_US
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/b8r339
id ndltd-TW-108NCCU5507001
record_format oai_dc
spelling ndltd-TW-108NCCU55070012019-10-12T03:34:53Z http://ndltd.ncl.edu.tw/handle/b8r339 Survey on Deep Learning with Imbalanced Data Sets 深度學習在不平衡數據集之研究 Tsai, Cheng-Hsiao 蔡承孝 碩士 國立政治大學 應用數學系 108 This paper is a survey on deep learning with imbalanced data sets and anomaly detection. We create two imbalanced data sets from MNIST for multi­-classification task with minority classes 0,1,4,6,7 and binary classification task with minority class 0. Our data sets are highly imbalanced with imbalanced rate ρ = 2500 and we use convolutional neural network(CNN) for training. In anomaly detection,we use the pretrained CNN handwriting classifier to decide the 18 cat and dog pictures are handwriting pictures or not. Due to the data set is imbalanced, the baseline model have poor performance on minority classes. Hence, we use 6 and 7 different methods to adjust our model. We find that the focal loss function and random over­-sampling(ROS) have best performance on multi­-classification task and binary classification task on our imbalanced data sets but the cost sensitive learning method is not suitable for our imbalanced data sets. By confidence estimation, our classifier successfully judge all the pictures of cat and dog are not handwriting picture. 蔡炎龍 2019 學位論文 ; thesis 168 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立政治大學 === 應用數學系 === 108 === This paper is a survey on deep learning with imbalanced data sets and anomaly detection. We create two imbalanced data sets from MNIST for multi­-classification task with minority classes 0,1,4,6,7 and binary classification task with minority class 0. Our data sets are highly imbalanced with imbalanced rate ρ = 2500 and we use convolutional neural network(CNN) for training. In anomaly detection,we use the pretrained CNN handwriting classifier to decide the 18 cat and dog pictures are handwriting pictures or not. Due to the data set is imbalanced, the baseline model have poor performance on minority classes. Hence, we use 6 and 7 different methods to adjust our model. We find that the focal loss function and random over­-sampling(ROS) have best performance on multi­-classification task and binary classification task on our imbalanced data sets but the cost sensitive learning method is not suitable for our imbalanced data sets. By confidence estimation, our classifier successfully judge all the pictures of cat and dog are not handwriting picture.
author2 蔡炎龍
author_facet 蔡炎龍
Tsai, Cheng-Hsiao
蔡承孝
author Tsai, Cheng-Hsiao
蔡承孝
spellingShingle Tsai, Cheng-Hsiao
蔡承孝
Survey on Deep Learning with Imbalanced Data Sets
author_sort Tsai, Cheng-Hsiao
title Survey on Deep Learning with Imbalanced Data Sets
title_short Survey on Deep Learning with Imbalanced Data Sets
title_full Survey on Deep Learning with Imbalanced Data Sets
title_fullStr Survey on Deep Learning with Imbalanced Data Sets
title_full_unstemmed Survey on Deep Learning with Imbalanced Data Sets
title_sort survey on deep learning with imbalanced data sets
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/b8r339
work_keys_str_mv AT tsaichenghsiao surveyondeeplearningwithimbalanceddatasets
AT càichéngxiào surveyondeeplearningwithimbalanceddatasets
AT tsaichenghsiao shēndùxuéxízàibùpínghéngshùjùjízhīyánjiū
AT càichéngxiào shēndùxuéxízàibùpínghéngshùjùjízhīyánjiū
_version_ 1719263857802739712