Privacy-Preserving Clustering of Data Streams

碩士 === 東吳大學 === 資訊管理學系 === 97 === Due to most historic studies on privacy-preserving data mining placed importance on the security of the massive amount of data from static database, consequently data undergoing privacy-preserving often lead to the decline in accuracy of mining result. Furthermore,...

Full description

Bibliographic Details
Main Authors:	Chih-Chin Shen, 沈志欽
Other Authors:	Ching-Ming Chao
Format:	Others
Language:	zh-TW
Published:	2009
Online Access:	http://ndltd.ncl.edu.tw/handle/98062034798829819799

id	ndltd-TW-097SCU05396002
record_format	oai_dc
spelling	ndltd-TW-097SCU053960022015-11-23T04:03:32Z http://ndltd.ncl.edu.tw/handle/98062034798829819799 Privacy-Preserving Clustering of Data Streams 資料串流分群探勘之隱私保護研究 Chih-Chin Shen 沈志欽碩士東吳大學資訊管理學系 97 Due to most historic studies on privacy-preserving data mining placed importance on the security of the massive amount of data from static database, consequently data undergoing privacy-preserving often lead to the decline in accuracy of mining result. Furthermore, following by the rapid advancement on internet and telecommunication technology, subsequently data types have transformed from traditional static data into data streams with consecutive, rapid, temporal, and caducous, as well as unpredictable properties. Due to the raising of such data types, traditional privacy-preserving data mining algorithm requiring complex calculation is no longer applicable. As a result, this paper has proposed a method of Privacy-Preserving Clustering of Data Streams (PPCDS) to improve data stream mining procedures while concurrently preserve privacy and a good mining accuracy. PPCDS is mainly composed of two phases: Rotation-Based Perturbation and cluster mining. In the phase of data rotating perturbation, a rotation transformation matrix is applied to rapidly perturb the data streams in order to preserve data privacy. In the cluster mining phase, perturbed data will first establish micro-cluster through optimization of cluster centers, then applying statistical calculation to update micro-cluster, as well as using geometric time frame to allocate and store micro-cluster, and finally input mining result through macro-cluster generation. Two simple data structure is added in the macro-cluster generation process to avoid recalculating the distance between the macro-point and the cluster center in the generation process. This process reduces the repeated calculation time in order to enhance mining efficiency without losing mining accuracy. Ching-Ming Chao 趙景明 2009 學位論文 ; thesis 70 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 東吳大學 === 資訊管理學系 === 97 === Due to most historic studies on privacy-preserving data mining placed importance on the security of the massive amount of data from static database, consequently data undergoing privacy-preserving often lead to the decline in accuracy of mining result. Furthermore, following by the rapid advancement on internet and telecommunication technology, subsequently data types have transformed from traditional static data into data streams with consecutive, rapid, temporal, and caducous, as well as unpredictable properties. Due to the raising of such data types, traditional privacy-preserving data mining algorithm requiring complex calculation is no longer applicable. As a result, this paper has proposed a method of Privacy-Preserving Clustering of Data Streams (PPCDS) to improve data stream mining procedures while concurrently preserve privacy and a good mining accuracy. PPCDS is mainly composed of two phases: Rotation-Based Perturbation and cluster mining. In the phase of data rotating perturbation, a rotation transformation matrix is applied to rapidly perturb the data streams in order to preserve data privacy. In the cluster mining phase, perturbed data will first establish micro-cluster through optimization of cluster centers, then applying statistical calculation to update micro-cluster, as well as using geometric time frame to allocate and store micro-cluster, and finally input mining result through macro-cluster generation. Two simple data structure is added in the macro-cluster generation process to avoid recalculating the distance between the macro-point and the cluster center in the generation process. This process reduces the repeated calculation time in order to enhance mining efficiency without losing mining accuracy.
author2	Ching-Ming Chao
author_facet	Ching-Ming Chao Chih-Chin Shen 沈志欽
author	Chih-Chin Shen 沈志欽
spellingShingle	Chih-Chin Shen 沈志欽 Privacy-Preserving Clustering of Data Streams
author_sort	Chih-Chin Shen
title	Privacy-Preserving Clustering of Data Streams
title_short	Privacy-Preserving Clustering of Data Streams
title_full	Privacy-Preserving Clustering of Data Streams
title_fullStr	Privacy-Preserving Clustering of Data Streams
title_full_unstemmed	Privacy-Preserving Clustering of Data Streams
title_sort	privacy-preserving clustering of data streams
publishDate	2009
url	http://ndltd.ncl.edu.tw/handle/98062034798829819799
work_keys_str_mv	AT chihchinshen privacypreservingclusteringofdatastreams AT chénzhìqīn privacypreservingclusteringofdatastreams AT chihchinshen zīliàochuànliúfēnqúntànkānzhīyǐnsībǎohùyánjiū AT chénzhìqīn zīliàochuànliúfēnqúntànkānzhīyǐnsībǎohùyánjiū
_version_	1718134923824463872

Privacy-Preserving Clustering of Data Streams

Similar Items