Privacy-Preserving Clustering of Data Streams

碩士 === 東吳大學 === 資訊管理學系 === 97 === Due to most historic studies on privacy-preserving data mining placed importance on the security of the massive amount of data from static database, consequently data undergoing privacy-preserving often lead to the decline in accuracy of mining result. Furthermore,...

Full description

Bibliographic Details
Main Authors: Chih-Chin Shen, 沈志欽
Other Authors: Ching-Ming Chao
Format: Others
Language:zh-TW
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/98062034798829819799
id ndltd-TW-097SCU05396002
record_format oai_dc
spelling ndltd-TW-097SCU053960022015-11-23T04:03:32Z http://ndltd.ncl.edu.tw/handle/98062034798829819799 Privacy-Preserving Clustering of Data Streams 資料串流分群探勘之隱私保護研究 Chih-Chin Shen 沈志欽 碩士 東吳大學 資訊管理學系 97 Due to most historic studies on privacy-preserving data mining placed importance on the security of the massive amount of data from static database, consequently data undergoing privacy-preserving often lead to the decline in accuracy of mining result. Furthermore, following by the rapid advancement on internet and telecommunication technology, subsequently data types have transformed from traditional static data into data streams with consecutive, rapid, temporal, and caducous, as well as unpredictable properties. Due to the raising of such data types, traditional privacy-preserving data mining algorithm requiring complex calculation is no longer applicable. As a result, this paper has proposed a method of Privacy-Preserving Clustering of Data Streams (PPCDS) to improve data stream mining procedures while concurrently preserve privacy and a good mining accuracy. PPCDS is mainly composed of two phases: Rotation-Based Perturbation and cluster mining. In the phase of data rotating perturbation, a rotation transformation matrix is applied to rapidly perturb the data streams in order to preserve data privacy. In the cluster mining phase, perturbed data will first establish micro-cluster through optimization of cluster centers, then applying statistical calculation to update micro-cluster, as well as using geometric time frame to allocate and store micro-cluster, and finally input mining result through macro-cluster generation. Two simple data structure is added in the macro-cluster generation process to avoid recalculating the distance between the macro-point and the cluster center in the generation process. This process reduces the repeated calculation time in order to enhance mining efficiency without losing mining accuracy. Ching-Ming Chao 趙景明 2009 學位論文 ; thesis 70 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 東吳大學 === 資訊管理學系 === 97 === Due to most historic studies on privacy-preserving data mining placed importance on the security of the massive amount of data from static database, consequently data undergoing privacy-preserving often lead to the decline in accuracy of mining result. Furthermore, following by the rapid advancement on internet and telecommunication technology, subsequently data types have transformed from traditional static data into data streams with consecutive, rapid, temporal, and caducous, as well as unpredictable properties. Due to the raising of such data types, traditional privacy-preserving data mining algorithm requiring complex calculation is no longer applicable. As a result, this paper has proposed a method of Privacy-Preserving Clustering of Data Streams (PPCDS) to improve data stream mining procedures while concurrently preserve privacy and a good mining accuracy. PPCDS is mainly composed of two phases: Rotation-Based Perturbation and cluster mining. In the phase of data rotating perturbation, a rotation transformation matrix is applied to rapidly perturb the data streams in order to preserve data privacy. In the cluster mining phase, perturbed data will first establish micro-cluster through optimization of cluster centers, then applying statistical calculation to update micro-cluster, as well as using geometric time frame to allocate and store micro-cluster, and finally input mining result through macro-cluster generation. Two simple data structure is added in the macro-cluster generation process to avoid recalculating the distance between the macro-point and the cluster center in the generation process. This process reduces the repeated calculation time in order to enhance mining efficiency without losing mining accuracy.
author2 Ching-Ming Chao
author_facet Ching-Ming Chao
Chih-Chin Shen
沈志欽
author Chih-Chin Shen
沈志欽
spellingShingle Chih-Chin Shen
沈志欽
Privacy-Preserving Clustering of Data Streams
author_sort Chih-Chin Shen
title Privacy-Preserving Clustering of Data Streams
title_short Privacy-Preserving Clustering of Data Streams
title_full Privacy-Preserving Clustering of Data Streams
title_fullStr Privacy-Preserving Clustering of Data Streams
title_full_unstemmed Privacy-Preserving Clustering of Data Streams
title_sort privacy-preserving clustering of data streams
publishDate 2009
url http://ndltd.ncl.edu.tw/handle/98062034798829819799
work_keys_str_mv AT chihchinshen privacypreservingclusteringofdatastreams
AT chénzhìqīn privacypreservingclusteringofdatastreams
AT chihchinshen zīliàochuànliúfēnqúntànkānzhīyǐnsībǎohùyánjiū
AT chénzhìqīn zīliàochuànliúfēnqúntànkānzhīyǐnsībǎohùyánjiū
_version_ 1718134923824463872