A clustering scheme for large high-dimensional document datasets
碩士 === 國立中山大學 === 電機工程學系研究所 === 95 === Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then ori...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2007
|
Online Access: | http://ndltd.ncl.edu.tw/handle/bsr4gq |
id |
ndltd-TW-095NSYS5442105 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-095NSYS54421052019-05-15T19:48:11Z http://ndltd.ncl.edu.tw/handle/bsr4gq A clustering scheme for large high-dimensional document datasets 一個處理大量高維度文件的分群架構 Jing-wen Chen 陳經文 碩士 國立中山大學 電機工程學系研究所 95 Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then original. We partition the whole dataset to several parts. First, use one of these parts for clustering. Then according to the label after clustering, we reduce the number of features by a certain ratio. Add another part of data, convert these data to lower dimension and cluster them again. Repeat this until all partitions are used. According to the experimental result, this scheme may run twice faster then the original clustering method. Shie-jue Lee 李錫智 2007 學位論文 ; thesis 45 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 電機工程學系研究所 === 95 === Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then original. We partition the whole dataset to several parts. First, use one of these parts for clustering. Then according to the label after clustering, we reduce the number of features by a certain ratio. Add another part of data, convert these data to lower dimension and cluster them again. Repeat this until all partitions are used. According to the experimental result, this scheme may run twice faster then the original clustering method.
|
author2 |
Shie-jue Lee |
author_facet |
Shie-jue Lee Jing-wen Chen 陳經文 |
author |
Jing-wen Chen 陳經文 |
spellingShingle |
Jing-wen Chen 陳經文 A clustering scheme for large high-dimensional document datasets |
author_sort |
Jing-wen Chen |
title |
A clustering scheme for large high-dimensional document datasets |
title_short |
A clustering scheme for large high-dimensional document datasets |
title_full |
A clustering scheme for large high-dimensional document datasets |
title_fullStr |
A clustering scheme for large high-dimensional document datasets |
title_full_unstemmed |
A clustering scheme for large high-dimensional document datasets |
title_sort |
clustering scheme for large high-dimensional document datasets |
publishDate |
2007 |
url |
http://ndltd.ncl.edu.tw/handle/bsr4gq |
work_keys_str_mv |
AT jingwenchen aclusteringschemeforlargehighdimensionaldocumentdatasets AT chénjīngwén aclusteringschemeforlargehighdimensionaldocumentdatasets AT jingwenchen yīgèchùlǐdàliànggāowéidùwénjiàndefēnqúnjiàgòu AT chénjīngwén yīgèchùlǐdàliànggāowéidùwénjiàndefēnqúnjiàgòu AT jingwenchen clusteringschemeforlargehighdimensionaldocumentdatasets AT chénjīngwén clusteringschemeforlargehighdimensionaldocumentdatasets |
_version_ |
1719094518663348224 |