Summary: | 碩士 === 淡江大學 === 資訊管理學系碩士班 === 93 === Based on whether consecutive items in sequential patterns should also be consecutive in the transactions, existing researches about sequential pattern mining could be classified into the following three categories: The first is to find continuous patterns; the second is to find discontinuous patterns; the third is to find hybrid patterns that combine both continuous patterns and discontinuous patterns. Previous hybrid sequential pattern mining algorithms were all based on the Apriori algorithm, but we discovered that their mining results are incomplete. Thus, based on the pattern-growth method, we propose a new algorithm (CHSPM) to find complete hybrid sequential patterns.
The four steps of CHSPM are as follows: 1. Build the supplemented frequent-1-sequence item set; 2. Reduce the database by erasing unimportant items from the transactions. 3. Partition the database, and build projected databases. 4. Recursively mine the projected databases and build sub-projected databases until all hybrid sequential patterns are found.
Finally, we use synthetic databases of 100,000 to 300,000 records to test our algorithm, and to compare our results with those of GFP2, the most efficient algorithm in hybrid sequential pattern mining up to now. The result shows that even though CHSPM is slower than GFP2, it can find out complete hybrid sequential patterns.
|