具時間限制之高效率序列樣式探勘演算法
碩士 === 逢甲大學 === 資訊工程所 === 94 === Sequential pattern mining is one of the important issues in the research of data mining. The mining is to find out all the frequent sub-sequences in a sequence database. In order to have more accurate results, constraints in addition to the support threshold need to...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Online Access: | http://ndltd.ncl.edu.tw/handle/85732792496754397386 |
id |
ndltd-TW-094FCU05392049 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-094FCU053920492015-12-11T04:04:18Z http://ndltd.ncl.edu.tw/handle/85732792496754397386 具時間限制之高效率序列樣式探勘演算法 EfficientAlgorithmsforMiningSequentialPatternswithTimeConstraints Chia-Wen Chang 張家汶 碩士 逢甲大學 資訊工程所 94 Sequential pattern mining is one of the important issues in the research of data mining. The mining is to find out all the frequent sub-sequences in a sequence database. In order to have more accurate results, constraints in addition to the support threshold need to be specified in the mining. Most time-independent constraints can be handled, without modifying the fundamental mining algorithm, by retrieving qualified patterns from the discovered ones. Time-constraints, however, cannot be managed by retrieving patterns because the support computation of patterns must validate the time attributes for every data sequence in the mining process. Therefore, a memory time-indexing approach, called METISP, is proposed in this thesis to discover sequential patterns with time constraints including minimum gap, maximum gap, exact gaps, sliding window, and duration. METISP scans the database into memory and constructs time-index sets for effective processing. Utilizing the index sets and the pattern-growth strategy, METISP efficiently mines the desired patterns without generating any candidate or sub-database. The index sets narrow down the search space to the sets of designated in-memory data sequences, and speed up the counting to the indicated ranges of potential items. In addition, a novel algorithm, called CTSP, is also proposed for mining closed sequential patterns with time constraints. The closed patterns preserve the complete information with more compact representations. We have evaluated METISP algorithm and CTSP algorithm with the well-known GSP algorithm and the DELISP algorithm for various datasets and constraints. The comprehensive experiments show that METISP and CTSP both have better efficiency, even with low support thresholds and very large databases. Ming-Yen Lin 林明言 學位論文 ; thesis 76 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 逢甲大學 === 資訊工程所 === 94 === Sequential pattern mining is one of the important issues in the research of data mining. The mining is to find out all the frequent sub-sequences in a sequence database. In order to have more accurate results, constraints in addition to the support threshold need to be specified in the mining. Most time-independent constraints can be handled, without modifying the fundamental mining algorithm, by retrieving qualified patterns from the discovered ones. Time-constraints, however, cannot be managed by retrieving patterns because the support computation of patterns must validate the time attributes for every data sequence in the mining process. Therefore, a memory time-indexing approach, called METISP, is proposed in this thesis to discover sequential patterns with time constraints including minimum gap, maximum gap, exact gaps, sliding window, and duration. METISP scans the database into memory and constructs time-index sets for effective processing. Utilizing the index sets and the pattern-growth strategy, METISP efficiently mines the desired patterns without generating any candidate or sub-database. The index sets narrow down the search space to the sets of designated in-memory data sequences, and speed up the counting to the indicated ranges of potential items. In addition, a novel algorithm, called CTSP, is also proposed for mining closed sequential patterns with time constraints. The closed patterns preserve the complete information with more compact representations. We have evaluated METISP algorithm and CTSP algorithm with the well-known GSP algorithm and the DELISP algorithm for various datasets and constraints. The comprehensive experiments show that METISP and CTSP both have better efficiency, even with low support thresholds and very large databases.
|
author2 |
Ming-Yen Lin |
author_facet |
Ming-Yen Lin Chia-Wen Chang 張家汶 |
author |
Chia-Wen Chang 張家汶 |
spellingShingle |
Chia-Wen Chang 張家汶 具時間限制之高效率序列樣式探勘演算法 |
author_sort |
Chia-Wen Chang |
title |
具時間限制之高效率序列樣式探勘演算法 |
title_short |
具時間限制之高效率序列樣式探勘演算法 |
title_full |
具時間限制之高效率序列樣式探勘演算法 |
title_fullStr |
具時間限制之高效率序列樣式探勘演算法 |
title_full_unstemmed |
具時間限制之高效率序列樣式探勘演算法 |
title_sort |
具時間限制之高效率序列樣式探勘演算法 |
url |
http://ndltd.ncl.edu.tw/handle/85732792496754397386 |
work_keys_str_mv |
AT chiawenchang jùshíjiānxiànzhìzhīgāoxiàolǜxùlièyàngshìtànkānyǎnsuànfǎ AT zhāngjiāwèn jùshíjiānxiànzhìzhīgāoxiàolǜxùlièyàngshìtànkānyǎnsuànfǎ AT chiawenchang efficientalgorithmsforminingsequentialpatternswithtimeconstraints AT zhāngjiāwèn efficientalgorithmsforminingsequentialpatternswithtimeconstraints |
_version_ |
1718147586643197952 |