具時間限制之高效率序列樣式探勘演算法

碩士 === 逢甲大學 === 資訊工程所 === 94 === Sequential pattern mining is one of the important issues in the research of data mining. The mining is to find out all the frequent sub-sequences in a sequence database. In order to have more accurate results, constraints in addition to the support threshold need to...

Full description

Bibliographic Details
Main Authors: Chia-Wen Chang, 張家汶
Other Authors: Ming-Yen Lin
Format: Others
Language:en_US
Online Access:http://ndltd.ncl.edu.tw/handle/85732792496754397386
id ndltd-TW-094FCU05392049
record_format oai_dc
spelling ndltd-TW-094FCU053920492015-12-11T04:04:18Z http://ndltd.ncl.edu.tw/handle/85732792496754397386 具時間限制之高效率序列樣式探勘演算法 EfficientAlgorithmsforMiningSequentialPatternswithTimeConstraints Chia-Wen Chang 張家汶 碩士 逢甲大學 資訊工程所 94 Sequential pattern mining is one of the important issues in the research of data mining. The mining is to find out all the frequent sub-sequences in a sequence database. In order to have more accurate results, constraints in addition to the support threshold need to be specified in the mining. Most time-independent constraints can be handled, without modifying the fundamental mining algorithm, by retrieving qualified patterns from the discovered ones. Time-constraints, however, cannot be managed by retrieving patterns because the support computation of patterns must validate the time attributes for every data sequence in the mining process. Therefore, a memory time-indexing approach, called METISP, is proposed in this thesis to discover sequential patterns with time constraints including minimum gap, maximum gap, exact gaps, sliding window, and duration. METISP scans the database into memory and constructs time-index sets for effective processing. Utilizing the index sets and the pattern-growth strategy, METISP efficiently mines the desired patterns without generating any candidate or sub-database. The index sets narrow down the search space to the sets of designated in-memory data sequences, and speed up the counting to the indicated ranges of potential items. In addition, a novel algorithm, called CTSP, is also proposed for mining closed sequential patterns with time constraints. The closed patterns preserve the complete information with more compact representations. We have evaluated METISP algorithm and CTSP algorithm with the well-known GSP algorithm and the DELISP algorithm for various datasets and constraints. The comprehensive experiments show that METISP and CTSP both have better efficiency, even with low support thresholds and very large databases. Ming-Yen Lin 林明言 學位論文 ; thesis 76 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 逢甲大學 === 資訊工程所 === 94 === Sequential pattern mining is one of the important issues in the research of data mining. The mining is to find out all the frequent sub-sequences in a sequence database. In order to have more accurate results, constraints in addition to the support threshold need to be specified in the mining. Most time-independent constraints can be handled, without modifying the fundamental mining algorithm, by retrieving qualified patterns from the discovered ones. Time-constraints, however, cannot be managed by retrieving patterns because the support computation of patterns must validate the time attributes for every data sequence in the mining process. Therefore, a memory time-indexing approach, called METISP, is proposed in this thesis to discover sequential patterns with time constraints including minimum gap, maximum gap, exact gaps, sliding window, and duration. METISP scans the database into memory and constructs time-index sets for effective processing. Utilizing the index sets and the pattern-growth strategy, METISP efficiently mines the desired patterns without generating any candidate or sub-database. The index sets narrow down the search space to the sets of designated in-memory data sequences, and speed up the counting to the indicated ranges of potential items. In addition, a novel algorithm, called CTSP, is also proposed for mining closed sequential patterns with time constraints. The closed patterns preserve the complete information with more compact representations. We have evaluated METISP algorithm and CTSP algorithm with the well-known GSP algorithm and the DELISP algorithm for various datasets and constraints. The comprehensive experiments show that METISP and CTSP both have better efficiency, even with low support thresholds and very large databases.
author2 Ming-Yen Lin
author_facet Ming-Yen Lin
Chia-Wen Chang
張家汶
author Chia-Wen Chang
張家汶
spellingShingle Chia-Wen Chang
張家汶
具時間限制之高效率序列樣式探勘演算法
author_sort Chia-Wen Chang
title 具時間限制之高效率序列樣式探勘演算法
title_short 具時間限制之高效率序列樣式探勘演算法
title_full 具時間限制之高效率序列樣式探勘演算法
title_fullStr 具時間限制之高效率序列樣式探勘演算法
title_full_unstemmed 具時間限制之高效率序列樣式探勘演算法
title_sort 具時間限制之高效率序列樣式探勘演算法
url http://ndltd.ncl.edu.tw/handle/85732792496754397386
work_keys_str_mv AT chiawenchang jùshíjiānxiànzhìzhīgāoxiàolǜxùlièyàngshìtànkānyǎnsuànfǎ
AT zhāngjiāwèn jùshíjiānxiànzhìzhīgāoxiàolǜxùlièyàngshìtànkānyǎnsuànfǎ
AT chiawenchang efficientalgorithmsforminingsequentialpatternswithtimeconstraints
AT zhāngjiāwèn efficientalgorithmsforminingsequentialpatternswithtimeconstraints
_version_ 1718147586643197952