none

碩士 === 國立中央大學 === 資訊管理學系 === 104 === In the big data era, data grows rapidly and so does noisy data. We need to do instance selection as data pre-processing to pick out representative data before mining the insight from data and keep the result qualified. As the amount of data grows up, the computat...

Full description

Bibliographic Details
Main Authors:	Yi-Huan Chen, 陳毅寰
Other Authors:	Chih-Fong Tsai
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/47140472636224459393

id	ndltd-TW-104NCU05396072
record_format	oai_dc
spelling	ndltd-TW-104NCU053960722017-05-27T04:35:42Z http://ndltd.ncl.edu.tw/handle/47140472636224459393 none 分治式樣本選取法於巨量資料探勘之研究 Yi-Huan Chen 陳毅寰碩士國立中央大學資訊管理學系 104 In the big data era, data grows rapidly and so does noisy data. We need to do instance selection as data pre-processing to pick out representative data before mining the insight from data and keep the result qualified. As the amount of data grows up, the computational complexity of performing instance selection can increase. It also affects the results of data selection and data mining. Additionally, no instance selection algorithm can provide the best result for every data set. There is no the best solution for each problem. In this work, we propose a divide and conquer-based instance selection framework, namely DCIS. First, it breaks the original data set into smaller sub-datasets and makes them in several groups. Second, it uses an instance selection algorithm to get representative data from each group sequentially. Last, it combines each part into one set as the final result after instance selection. We use small data sets to examine the performances of DCIS with different numbers of sub-datasets in the first step of DCIS and different ways of combination in the final step of DCIS. Moreover, large scale datasets are also used to assess the applicability of DCIS. The experimental result shows that DCIS is a suitable framework to enhance the performance of instance selection over both small and large scale datasets. Chih-Fong Tsai 蔡志豐 2016 學位論文 ; thesis 68 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立中央大學 === 資訊管理學系 === 104 === In the big data era, data grows rapidly and so does noisy data. We need to do instance selection as data pre-processing to pick out representative data before mining the insight from data and keep the result qualified. As the amount of data grows up, the computational complexity of performing instance selection can increase. It also affects the results of data selection and data mining. Additionally, no instance selection algorithm can provide the best result for every data set. There is no the best solution for each problem. In this work, we propose a divide and conquer-based instance selection framework, namely DCIS. First, it breaks the original data set into smaller sub-datasets and makes them in several groups. Second, it uses an instance selection algorithm to get representative data from each group sequentially. Last, it combines each part into one set as the final result after instance selection. We use small data sets to examine the performances of DCIS with different numbers of sub-datasets in the first step of DCIS and different ways of combination in the final step of DCIS. Moreover, large scale datasets are also used to assess the applicability of DCIS. The experimental result shows that DCIS is a suitable framework to enhance the performance of instance selection over both small and large scale datasets.
author2	Chih-Fong Tsai
author_facet	Chih-Fong Tsai Yi-Huan Chen 陳毅寰
author	Yi-Huan Chen 陳毅寰
spellingShingle	Yi-Huan Chen 陳毅寰 none
author_sort	Yi-Huan Chen
title	none
title_short	none
title_full	none
title_fullStr	none
title_full_unstemmed	none
title_sort	none
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/47140472636224459393
work_keys_str_mv	AT yihuanchen none AT chényìhuán none AT yihuanchen fēnzhìshìyàngběnxuǎnqǔfǎyújùliàngzīliàotànkānzhīyánjiū AT chényìhuán fēnzhìshìyàngběnxuǎnqǔfǎyújùliàngzīliàotànkānzhīyánjiū
_version_	1718454035725418496

none

Similar Items