Privacy-Preserving Clustering of Data Streams
碩士 === 東吳大學 === 資訊管理學系 === 97 === Due to most historic studies on privacy-preserving data mining placed importance on the security of the massive amount of data from static database, consequently data undergoing privacy-preserving often lead to the decline in accuracy of mining result. Furthermore,...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2009
|
Online Access: | http://ndltd.ncl.edu.tw/handle/98062034798829819799 |
id |
ndltd-TW-097SCU05396002 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-097SCU053960022015-11-23T04:03:32Z http://ndltd.ncl.edu.tw/handle/98062034798829819799 Privacy-Preserving Clustering of Data Streams 資料串流分群探勘之隱私保護研究 Chih-Chin Shen 沈志欽 碩士 東吳大學 資訊管理學系 97 Due to most historic studies on privacy-preserving data mining placed importance on the security of the massive amount of data from static database, consequently data undergoing privacy-preserving often lead to the decline in accuracy of mining result. Furthermore, following by the rapid advancement on internet and telecommunication technology, subsequently data types have transformed from traditional static data into data streams with consecutive, rapid, temporal, and caducous, as well as unpredictable properties. Due to the raising of such data types, traditional privacy-preserving data mining algorithm requiring complex calculation is no longer applicable. As a result, this paper has proposed a method of Privacy-Preserving Clustering of Data Streams (PPCDS) to improve data stream mining procedures while concurrently preserve privacy and a good mining accuracy. PPCDS is mainly composed of two phases: Rotation-Based Perturbation and cluster mining. In the phase of data rotating perturbation, a rotation transformation matrix is applied to rapidly perturb the data streams in order to preserve data privacy. In the cluster mining phase, perturbed data will first establish micro-cluster through optimization of cluster centers, then applying statistical calculation to update micro-cluster, as well as using geometric time frame to allocate and store micro-cluster, and finally input mining result through macro-cluster generation. Two simple data structure is added in the macro-cluster generation process to avoid recalculating the distance between the macro-point and the cluster center in the generation process. This process reduces the repeated calculation time in order to enhance mining efficiency without losing mining accuracy. Ching-Ming Chao 趙景明 2009 學位論文 ; thesis 70 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 東吳大學 === 資訊管理學系 === 97 === Due to most historic studies on privacy-preserving data mining placed importance on the security of the massive amount of data from static database, consequently data undergoing privacy-preserving often lead to the decline in accuracy of mining result. Furthermore, following by the rapid advancement on internet and telecommunication technology, subsequently data types have transformed from traditional static data into data streams with consecutive, rapid, temporal, and caducous, as well as unpredictable properties. Due to the raising of such data types, traditional privacy-preserving data mining algorithm requiring complex calculation is no longer applicable.
As a result, this paper has proposed a method of Privacy-Preserving Clustering of Data Streams (PPCDS) to improve data stream mining procedures while concurrently preserve privacy and a good mining accuracy. PPCDS is mainly composed of two phases: Rotation-Based Perturbation and cluster mining. In the phase of data rotating perturbation, a rotation transformation matrix is applied to rapidly perturb the data streams in order to preserve data privacy. In the cluster mining phase, perturbed data will first establish micro-cluster through optimization of cluster centers, then applying statistical calculation to update micro-cluster, as well as using geometric time frame to allocate and store micro-cluster, and finally input mining result through macro-cluster generation. Two simple data structure is added in the macro-cluster generation process to avoid recalculating the distance between the macro-point and the cluster center in the generation process. This process reduces the repeated calculation time in order to enhance mining efficiency without losing mining accuracy.
|
author2 |
Ching-Ming Chao |
author_facet |
Ching-Ming Chao Chih-Chin Shen 沈志欽 |
author |
Chih-Chin Shen 沈志欽 |
spellingShingle |
Chih-Chin Shen 沈志欽 Privacy-Preserving Clustering of Data Streams |
author_sort |
Chih-Chin Shen |
title |
Privacy-Preserving Clustering of Data Streams |
title_short |
Privacy-Preserving Clustering of Data Streams |
title_full |
Privacy-Preserving Clustering of Data Streams |
title_fullStr |
Privacy-Preserving Clustering of Data Streams |
title_full_unstemmed |
Privacy-Preserving Clustering of Data Streams |
title_sort |
privacy-preserving clustering of data streams |
publishDate |
2009 |
url |
http://ndltd.ncl.edu.tw/handle/98062034798829819799 |
work_keys_str_mv |
AT chihchinshen privacypreservingclusteringofdatastreams AT chénzhìqīn privacypreservingclusteringofdatastreams AT chihchinshen zīliàochuànliúfēnqúntànkānzhīyǐnsībǎohùyánjiū AT chénzhìqīn zīliàochuànliúfēnqúntànkānzhīyǐnsībǎohùyánjiū |
_version_ |
1718134923824463872 |