none

碩士 === 國立中央大學 === 資訊管理學系 === 104 === In the big data era, data grows rapidly and so does noisy data. We need to do instance selection as data pre-processing to pick out representative data before mining the insight from data and keep the result qualified. As the amount of data grows up, the computat...

Full description

Bibliographic Details
Main Authors: Yi-Huan Chen, 陳毅寰
Other Authors: Chih-Fong Tsai
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/47140472636224459393
id ndltd-TW-104NCU05396072
record_format oai_dc
spelling ndltd-TW-104NCU053960722017-05-27T04:35:42Z http://ndltd.ncl.edu.tw/handle/47140472636224459393 none 分治式樣本選取法於巨量資料探勘之研究 Yi-Huan Chen 陳毅寰 碩士 國立中央大學 資訊管理學系 104 In the big data era, data grows rapidly and so does noisy data. We need to do instance selection as data pre-processing to pick out representative data before mining the insight from data and keep the result qualified. As the amount of data grows up, the computational complexity of performing instance selection can increase. It also affects the results of data selection and data mining. Additionally, no instance selection algorithm can provide the best result for every data set. There is no the best solution for each problem. In this work, we propose a divide and conquer-based instance selection framework, namely DCIS. First, it breaks the original data set into smaller sub-datasets and makes them in several groups. Second, it uses an instance selection algorithm to get representative data from each group sequentially. Last, it combines each part into one set as the final result after instance selection. We use small data sets to examine the performances of DCIS with different numbers of sub-datasets in the first step of DCIS and different ways of combination in the final step of DCIS. Moreover, large scale datasets are also used to assess the applicability of DCIS. The experimental result shows that DCIS is a suitable framework to enhance the performance of instance selection over both small and large scale datasets. Chih-Fong Tsai 蔡志豐 2016 學位論文 ; thesis 68 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中央大學 === 資訊管理學系 === 104 === In the big data era, data grows rapidly and so does noisy data. We need to do instance selection as data pre-processing to pick out representative data before mining the insight from data and keep the result qualified. As the amount of data grows up, the computational complexity of performing instance selection can increase. It also affects the results of data selection and data mining. Additionally, no instance selection algorithm can provide the best result for every data set. There is no the best solution for each problem. In this work, we propose a divide and conquer-based instance selection framework, namely DCIS. First, it breaks the original data set into smaller sub-datasets and makes them in several groups. Second, it uses an instance selection algorithm to get representative data from each group sequentially. Last, it combines each part into one set as the final result after instance selection. We use small data sets to examine the performances of DCIS with different numbers of sub-datasets in the first step of DCIS and different ways of combination in the final step of DCIS. Moreover, large scale datasets are also used to assess the applicability of DCIS. The experimental result shows that DCIS is a suitable framework to enhance the performance of instance selection over both small and large scale datasets.
author2 Chih-Fong Tsai
author_facet Chih-Fong Tsai
Yi-Huan Chen
陳毅寰
author Yi-Huan Chen
陳毅寰
spellingShingle Yi-Huan Chen
陳毅寰
none
author_sort Yi-Huan Chen
title none
title_short none
title_full none
title_fullStr none
title_full_unstemmed none
title_sort none
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/47140472636224459393
work_keys_str_mv AT yihuanchen none
AT chényìhuán none
AT yihuanchen fēnzhìshìyàngběnxuǎnqǔfǎyújùliàngzīliàotànkānzhīyánjiū
AT chényìhuán fēnzhìshìyàngběnxuǎnqǔfǎyújùliàngzīliàotànkānzhīyánjiū
_version_ 1718454035725418496