A clustering scheme for large high-dimensional document datasets

碩士 === 國立中山大學 === 電機工程學系研究所 === 95 === Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then ori...

Full description

Bibliographic Details
Main Authors:	Jing-wen Chen, 陳經文
Other Authors:	Shie-jue Lee
Format:	Others
Language:	zh-TW
Published:	2007
Online Access:	http://ndltd.ncl.edu.tw/handle/bsr4gq

id	ndltd-TW-095NSYS5442105
record_format	oai_dc
spelling	ndltd-TW-095NSYS54421052019-05-15T19:48:11Z http://ndltd.ncl.edu.tw/handle/bsr4gq A clustering scheme for large high-dimensional document datasets 一個處理大量高維度文件的分群架構 Jing-wen Chen 陳經文碩士國立中山大學電機工程學系研究所 95 Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then original. We partition the whole dataset to several parts. First, use one of these parts for clustering. Then according to the label after clustering, we reduce the number of features by a certain ratio. Add another part of data, convert these data to lower dimension and cluster them again. Repeat this until all partitions are used. According to the experimental result, this scheme may run twice faster then the original clustering method. Shie-jue Lee 李錫智 2007 學位論文 ; thesis 45 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立中山大學 === 電機工程學系研究所 === 95 === Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then original. We partition the whole dataset to several parts. First, use one of these parts for clustering. Then according to the label after clustering, we reduce the number of features by a certain ratio. Add another part of data, convert these data to lower dimension and cluster them again. Repeat this until all partitions are used. According to the experimental result, this scheme may run twice faster then the original clustering method.
author2	Shie-jue Lee
author_facet	Shie-jue Lee Jing-wen Chen 陳經文
author	Jing-wen Chen 陳經文
spellingShingle	Jing-wen Chen 陳經文 A clustering scheme for large high-dimensional document datasets
author_sort	Jing-wen Chen
title	A clustering scheme for large high-dimensional document datasets
title_short	A clustering scheme for large high-dimensional document datasets
title_full	A clustering scheme for large high-dimensional document datasets
title_fullStr	A clustering scheme for large high-dimensional document datasets
title_full_unstemmed	A clustering scheme for large high-dimensional document datasets
title_sort	clustering scheme for large high-dimensional document datasets
publishDate	2007
url	http://ndltd.ncl.edu.tw/handle/bsr4gq
work_keys_str_mv	AT jingwenchen aclusteringschemeforlargehighdimensionaldocumentdatasets AT chénjīngwén aclusteringschemeforlargehighdimensionaldocumentdatasets AT jingwenchen yīgèchùlǐdàliànggāowéidùwénjiàndefēnqúnjiàgòu AT chénjīngwén yīgèchùlǐdàliànggāowéidùwénjiàndefēnqúnjiàgòu AT jingwenchen clusteringschemeforlargehighdimensionaldocumentdatasets AT chénjīngwén clusteringschemeforlargehighdimensionaldocumentdatasets
_version_	1719094518663348224

A clustering scheme for large high-dimensional document datasets

Similar Items