A clustering scheme for large high-dimensional document datasets

碩士 === 國立中山大學 === 電機工程學系研究所 === 95 === Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then ori...

Full description

Bibliographic Details
Main Authors: Jing-wen Chen, 陳經文
Other Authors: Shie-jue Lee
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/bsr4gq
id ndltd-TW-095NSYS5442105
record_format oai_dc
spelling ndltd-TW-095NSYS54421052019-05-15T19:48:11Z http://ndltd.ncl.edu.tw/handle/bsr4gq A clustering scheme for large high-dimensional document datasets 一個處理大量高維度文件的分群架構 Jing-wen Chen 陳經文 碩士 國立中山大學 電機工程學系研究所 95 Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then original. We partition the whole dataset to several parts. First, use one of these parts for clustering. Then according to the label after clustering, we reduce the number of features by a certain ratio. Add another part of data, convert these data to lower dimension and cluster them again. Repeat this until all partitions are used. According to the experimental result, this scheme may run twice faster then the original clustering method. Shie-jue Lee 李錫智 2007 學位論文 ; thesis 45 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 電機工程學系研究所 === 95 === Peoples pay more and more attention on document clustering methods. Because of the high dimension and the large number of data, clustering methods usually need a lot of time to calculate. We propose a scheme to make the clustering algorithm much faster then original. We partition the whole dataset to several parts. First, use one of these parts for clustering. Then according to the label after clustering, we reduce the number of features by a certain ratio. Add another part of data, convert these data to lower dimension and cluster them again. Repeat this until all partitions are used. According to the experimental result, this scheme may run twice faster then the original clustering method.
author2 Shie-jue Lee
author_facet Shie-jue Lee
Jing-wen Chen
陳經文
author Jing-wen Chen
陳經文
spellingShingle Jing-wen Chen
陳經文
A clustering scheme for large high-dimensional document datasets
author_sort Jing-wen Chen
title A clustering scheme for large high-dimensional document datasets
title_short A clustering scheme for large high-dimensional document datasets
title_full A clustering scheme for large high-dimensional document datasets
title_fullStr A clustering scheme for large high-dimensional document datasets
title_full_unstemmed A clustering scheme for large high-dimensional document datasets
title_sort clustering scheme for large high-dimensional document datasets
publishDate 2007
url http://ndltd.ncl.edu.tw/handle/bsr4gq
work_keys_str_mv AT jingwenchen aclusteringschemeforlargehighdimensionaldocumentdatasets
AT chénjīngwén aclusteringschemeforlargehighdimensionaldocumentdatasets
AT jingwenchen yīgèchùlǐdàliànggāowéidùwénjiàndefēnqúnjiàgòu
AT chénjīngwén yīgèchùlǐdàliànggāowéidùwénjiàndefēnqúnjiàgòu
AT jingwenchen clusteringschemeforlargehighdimensionaldocumentdatasets
AT chénjīngwén clusteringschemeforlargehighdimensionaldocumentdatasets
_version_ 1719094518663348224