none
碩士 === 國立中央大學 === 資訊管理學系 === 104 === In the big data era, data grows rapidly and so does noisy data. We need to do instance selection as data pre-processing to pick out representative data before mining the insight from data and keep the result qualified. As the amount of data grows up, the computat...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2016
|
Online Access: | http://ndltd.ncl.edu.tw/handle/47140472636224459393 |
id |
ndltd-TW-104NCU05396072 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-104NCU053960722017-05-27T04:35:42Z http://ndltd.ncl.edu.tw/handle/47140472636224459393 none 分治式樣本選取法於巨量資料探勘之研究 Yi-Huan Chen 陳毅寰 碩士 國立中央大學 資訊管理學系 104 In the big data era, data grows rapidly and so does noisy data. We need to do instance selection as data pre-processing to pick out representative data before mining the insight from data and keep the result qualified. As the amount of data grows up, the computational complexity of performing instance selection can increase. It also affects the results of data selection and data mining. Additionally, no instance selection algorithm can provide the best result for every data set. There is no the best solution for each problem. In this work, we propose a divide and conquer-based instance selection framework, namely DCIS. First, it breaks the original data set into smaller sub-datasets and makes them in several groups. Second, it uses an instance selection algorithm to get representative data from each group sequentially. Last, it combines each part into one set as the final result after instance selection. We use small data sets to examine the performances of DCIS with different numbers of sub-datasets in the first step of DCIS and different ways of combination in the final step of DCIS. Moreover, large scale datasets are also used to assess the applicability of DCIS. The experimental result shows that DCIS is a suitable framework to enhance the performance of instance selection over both small and large scale datasets. Chih-Fong Tsai 蔡志豐 2016 學位論文 ; thesis 68 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中央大學 === 資訊管理學系 === 104 === In the big data era, data grows rapidly and so does noisy data. We need to do instance selection as data pre-processing to pick out representative data before mining the insight from data and keep the result qualified.
As the amount of data grows up, the computational complexity of performing instance selection can increase. It also affects the results of data selection and data mining. Additionally, no instance selection algorithm can provide the best result for every data set. There is no the best solution for each problem.
In this work, we propose a divide and conquer-based instance selection framework, namely DCIS. First, it breaks the original data set into smaller sub-datasets and makes them in several groups. Second, it uses an instance selection algorithm to get representative data from each group sequentially. Last, it combines each part into one set as the final result after instance selection.
We use small data sets to examine the performances of DCIS with different numbers of sub-datasets in the first step of DCIS and different ways of combination in the final step of DCIS. Moreover, large scale datasets are also used to assess the applicability of DCIS. The experimental result shows that DCIS is a suitable framework to enhance the performance of instance selection over both small and large scale datasets.
|
author2 |
Chih-Fong Tsai |
author_facet |
Chih-Fong Tsai Yi-Huan Chen 陳毅寰 |
author |
Yi-Huan Chen 陳毅寰 |
spellingShingle |
Yi-Huan Chen 陳毅寰 none |
author_sort |
Yi-Huan Chen |
title |
none |
title_short |
none |
title_full |
none |
title_fullStr |
none |
title_full_unstemmed |
none |
title_sort |
none |
publishDate |
2016 |
url |
http://ndltd.ncl.edu.tw/handle/47140472636224459393 |
work_keys_str_mv |
AT yihuanchen none AT chényìhuán none AT yihuanchen fēnzhìshìyàngběnxuǎnqǔfǎyújùliàngzīliàotànkānzhīyánjiū AT chényìhuán fēnzhìshìyàngběnxuǎnqǔfǎyújùliàngzīliàotànkānzhīyánjiū |
_version_ |
1718454035725418496 |