An extended BIRCH-based clustering algorithm for large time-series datasets

Temporal data analysis and mining has attracted substantial interest due to theproliferation and ubiquity of time series in many fields. Time series clustering isone of the most popular mining methods, and many time series clustering algorithmsprimarily focus on detecting the clusters in a batch fas...

Full description

Bibliographic Details
Main Author:	Lei, Jiahuan
Format:	Others
Language:	English
Published:	Mittuniversitetet, Avdelningen för informations- och kommunikationssystem 2017
Subjects:	Time series Data stream Clustering BIRCH DTW DBA. Computer Engineering Datorteknik
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-29858

id	ndltd-UPSALLA1-oai-DiVA.org-miun-29858
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-miun-298582018-01-14T05:13:08ZAn extended BIRCH-based clustering algorithm for large time-series datasetsengLei, JiahuanMittuniversitetet, Avdelningen för informations- och kommunikationssystem2017Time seriesData streamClusteringBIRCHDTWDBA.Computer EngineeringDatorteknikTemporal data analysis and mining has attracted substantial interest due to theproliferation and ubiquity of time series in many fields. Time series clustering isone of the most popular mining methods, and many time series clustering algorithmsprimarily focus on detecting the clusters in a batch fashion that will use alot of memory space and thus limit the scalability and capability for large timeseries.The BIRCH algorithm has been proven to scale well to large datasets,which is characterized by an incrementally clustering data objects using a singlescan. However the Euclidean distance metric employed in BIRCH has beenproven to not be accurate for time series and will degrade the accuracy performance.To overcome this drawback, this work proposes an extended BIRCH algorithmfor large time series. The BIRCH clustering algorithm is extended bychanging the cluster feature vector to the proposed modified cluster feature, replacingthe original Euclidean distance measure with dynamic time warping andemploying DTW barycenter averaging method as the centroid computation approach,which is more suitable for time-series clustering than any other averagingmethods. To demonstrate the effectiveness of the proposed algorithm, weconducted an extensive evaluation of our algorithm against BIRCH, k-meansand their variants with combinations of competitive distance measures. Experimentalresults show that the extended BIRCH algorithm improves the accuracysignificantly compared to the BIRCH algorithm and its variants, and achievescompetitive and similar accuracy as k-means and its variant, k-DBA. However,unlike k-means and k-DBA, the extended BIRCH algorithm maintains the abilityof incrementally handling continuous incoming data objects, which is thekey to cluster large time-series datasets. Finally the extended BIRCH-based algorithmis applied to solve a subsequence time-series clustering task of a simulationmulti-variate time-series dataset with the help of a sliding window. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-29858Local DT-V16-A2-003application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Time series Data stream Clustering BIRCH DTW DBA. Computer Engineering Datorteknik
spellingShingle	Time series Data stream Clustering BIRCH DTW DBA. Computer Engineering Datorteknik Lei, Jiahuan An extended BIRCH-based clustering algorithm for large time-series datasets
description	Temporal data analysis and mining has attracted substantial interest due to theproliferation and ubiquity of time series in many fields. Time series clustering isone of the most popular mining methods, and many time series clustering algorithmsprimarily focus on detecting the clusters in a batch fashion that will use alot of memory space and thus limit the scalability and capability for large timeseries.The BIRCH algorithm has been proven to scale well to large datasets,which is characterized by an incrementally clustering data objects using a singlescan. However the Euclidean distance metric employed in BIRCH has beenproven to not be accurate for time series and will degrade the accuracy performance.To overcome this drawback, this work proposes an extended BIRCH algorithmfor large time series. The BIRCH clustering algorithm is extended bychanging the cluster feature vector to the proposed modified cluster feature, replacingthe original Euclidean distance measure with dynamic time warping andemploying DTW barycenter averaging method as the centroid computation approach,which is more suitable for time-series clustering than any other averagingmethods. To demonstrate the effectiveness of the proposed algorithm, weconducted an extensive evaluation of our algorithm against BIRCH, k-meansand their variants with combinations of competitive distance measures. Experimentalresults show that the extended BIRCH algorithm improves the accuracysignificantly compared to the BIRCH algorithm and its variants, and achievescompetitive and similar accuracy as k-means and its variant, k-DBA. However,unlike k-means and k-DBA, the extended BIRCH algorithm maintains the abilityof incrementally handling continuous incoming data objects, which is thekey to cluster large time-series datasets. Finally the extended BIRCH-based algorithmis applied to solve a subsequence time-series clustering task of a simulationmulti-variate time-series dataset with the help of a sliding window.
author	Lei, Jiahuan
author_facet	Lei, Jiahuan
author_sort	Lei, Jiahuan
title	An extended BIRCH-based clustering algorithm for large time-series datasets
title_short	An extended BIRCH-based clustering algorithm for large time-series datasets
title_full	An extended BIRCH-based clustering algorithm for large time-series datasets
title_fullStr	An extended BIRCH-based clustering algorithm for large time-series datasets
title_full_unstemmed	An extended BIRCH-based clustering algorithm for large time-series datasets
title_sort	extended birch-based clustering algorithm for large time-series datasets
publisher	Mittuniversitetet, Avdelningen för informations- och kommunikationssystem
publishDate	2017
url	http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-29858
work_keys_str_mv	AT leijiahuan anextendedbirchbasedclusteringalgorithmforlargetimeseriesdatasets AT leijiahuan extendedbirchbasedclusteringalgorithmforlargetimeseriesdatasets
_version_	1718610822769410048

An extended BIRCH-based clustering algorithm for large time-series datasets

Similar Items