A Cluster-based Genetic Approach for Segmentation of Time Series and Pattern Discovery

碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 96 === A time series is composed of lots of data points, each of which represents a value at a certain time. Many phenomena can be represented by time series, such as electrocardiograms in medical science, gene expressions in biology and video data in multimedia. Tim...

Full description

Bibliographic Details
Main Authors: Pai-Chieh Huang, 黃柏傑
Other Authors: Shin-Mu Tseng
Format: Others
Language:zh-TW
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/36963777508977675700
id ndltd-TW-096NCKU5392033
record_format oai_dc
spelling ndltd-TW-096NCKU53920332015-11-23T04:02:51Z http://ndltd.ncl.edu.tw/handle/36963777508977675700 A Cluster-based Genetic Approach for Segmentation of Time Series and Pattern Discovery 以群集式基因演算法為基礎的時間序列切割法與型樣之發掘 Pai-Chieh Huang 黃柏傑 碩士 國立成功大學 資訊工程學系碩博士班 96 A time series is composed of lots of data points, each of which represents a value at a certain time. Many phenomena can be represented by time series, such as electrocardiograms in medical science, gene expressions in biology and video data in multimedia. Time series have thus been an important and interesting research field due to their frequent appearance in different applications. It is related to many research topics, including anomaly detection, similarity measurement, dimension reduction and segmentation, among others. In this thesis, we proposed a time series segmentation approach by combining the clustering technique, the discrete wavelet transformation and the genetic algorithm to automatically find segments and patterns from a time series and reduce the raised problems in previous approach. The first one is that it may cause distortion of segments when using the discrete wavelet transformation (DWT) to adjust the length of the subsequences. The second one is that if a group contains only one segment then it may result in a less meaningful pattern. The proposed approach first divides the segments in a chromosome into k groups according to their slopes by using clustering techniques. In order to deal with these problems, two factors, namely the density factor and the distortion factor, are used to solve them. The distortion factor is used to avoid the distortion of the segments and the density factor is used to avoid generation of meaningless patterns. The fitness value of a chromosome is then evaluated by the distances of segments and these two factors. Experimental results on real financial datasets in Taiwan also show the effectiveness of the proposed approach. Shin-Mu Tseng 曾新穆 2008 學位論文 ; thesis 68 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 96 === A time series is composed of lots of data points, each of which represents a value at a certain time. Many phenomena can be represented by time series, such as electrocardiograms in medical science, gene expressions in biology and video data in multimedia. Time series have thus been an important and interesting research field due to their frequent appearance in different applications. It is related to many research topics, including anomaly detection, similarity measurement, dimension reduction and segmentation, among others. In this thesis, we proposed a time series segmentation approach by combining the clustering technique, the discrete wavelet transformation and the genetic algorithm to automatically find segments and patterns from a time series and reduce the raised problems in previous approach. The first one is that it may cause distortion of segments when using the discrete wavelet transformation (DWT) to adjust the length of the subsequences. The second one is that if a group contains only one segment then it may result in a less meaningful pattern. The proposed approach first divides the segments in a chromosome into k groups according to their slopes by using clustering techniques. In order to deal with these problems, two factors, namely the density factor and the distortion factor, are used to solve them. The distortion factor is used to avoid the distortion of the segments and the density factor is used to avoid generation of meaningless patterns. The fitness value of a chromosome is then evaluated by the distances of segments and these two factors. Experimental results on real financial datasets in Taiwan also show the effectiveness of the proposed approach.
author2 Shin-Mu Tseng
author_facet Shin-Mu Tseng
Pai-Chieh Huang
黃柏傑
author Pai-Chieh Huang
黃柏傑
spellingShingle Pai-Chieh Huang
黃柏傑
A Cluster-based Genetic Approach for Segmentation of Time Series and Pattern Discovery
author_sort Pai-Chieh Huang
title A Cluster-based Genetic Approach for Segmentation of Time Series and Pattern Discovery
title_short A Cluster-based Genetic Approach for Segmentation of Time Series and Pattern Discovery
title_full A Cluster-based Genetic Approach for Segmentation of Time Series and Pattern Discovery
title_fullStr A Cluster-based Genetic Approach for Segmentation of Time Series and Pattern Discovery
title_full_unstemmed A Cluster-based Genetic Approach for Segmentation of Time Series and Pattern Discovery
title_sort cluster-based genetic approach for segmentation of time series and pattern discovery
publishDate 2008
url http://ndltd.ncl.edu.tw/handle/36963777508977675700
work_keys_str_mv AT paichiehhuang aclusterbasedgeneticapproachforsegmentationoftimeseriesandpatterndiscovery
AT huángbǎijié aclusterbasedgeneticapproachforsegmentationoftimeseriesandpatterndiscovery
AT paichiehhuang yǐqúnjíshìjīyīnyǎnsuànfǎwèijīchǔdeshíjiānxùlièqiègēfǎyǔxíngyàngzhīfājué
AT huángbǎijié yǐqúnjíshìjīyīnyǎnsuànfǎwèijīchǔdeshíjiānxùlièqiègēfǎyǔxíngyàngzhīfājué
AT paichiehhuang clusterbasedgeneticapproachforsegmentationoftimeseriesandpatterndiscovery
AT huángbǎijié clusterbasedgeneticapproachforsegmentationoftimeseriesandpatterndiscovery
_version_ 1718133970699288576