Efficient Sequential Pattern Mining by Breadth-First Approach

碩士 === 國立臺灣大學 === 資訊管理學研究所 === 92 === Since the GSP algorithm is proposed to mine sequential patterns in sequence databases, many methods have been proposed and mostly focused on mining the complete set of frequent patterns. The CloSpan algorithm first suggested that the closed set of sequential pat...

Full description

Bibliographic Details
Main Authors: Keng-Yuan Chang, 張耕源
Other Authors: 李瑞庭
Format: Others
Language:en_US
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/14108498328012924958
Description
Summary:碩士 === 國立臺灣大學 === 資訊管理學研究所 === 92 === Since the GSP algorithm is proposed to mine sequential patterns in sequence databases, many methods have been proposed and mostly focused on mining the complete set of frequent patterns. The CloSpan algorithm first suggested that the closed set of sequential patterns is more compact and has the same expressive power with respect to the full set. Based on the PrefixSpan algorithm, CloSpan added two pruning techniques, backward sub-pattern and backward super-pattern, to efficiently mine the closed set. Therefore, in this thesis, we propose a new sequential pattern mining algorithm to mine closed sequences. However, instead of depth-first searching used in many previous methods, we adopt a breadth-first approach. Besides, previous methods seldom utilize the property of item ordering to enhance efficiency. We used a list of positional data to reserve the information of item ordering. By using these positional data, we developed two main pruning techniques, backward super-pattern condition and same positional data condition. To ensure correct and compact resulted lattice, we also manipulated some special conditions. From the experimental results, our algorithm outperforms CloSpan in the cases of moderately large datasets and low support threshold.