Summary: | <p> Current Big Data era is generating tremendous amount of data in most fields such as business, social media, engineering, and medicine. The demand to process and handle the resulting "big data" has led to the need for fast data mining methods to develop powerful and versatile analysis tools that can turn data into useful knowledge. Frequent pattern mining (FPM) is an important task in data mining with numerous applications such as recommendation systems, consumer market analysis, web mining, network intrusion detection, etc. We develop efficient high performance FPM methods for large-scale databases on different computing platforms, including personal computers (PCs), multi-core multi-socket servers, clusters and graphics processing units (GPUs). At the core of our research is a novel self-adaptive approach that performs efficiently and fast on both sparse and dense databases, and outperforms its sequential counterparts. This approach applies multiple mining strategies and dynamically switches among them based on the data characteristics detected at runtime. The research results include two sequential FPM methods (i.e. FEM and DFEM) and three parallel ones (i.e. ShaFEM, SDFEM and CGMM). These methods are applicable to develop powerful and scalable mining tools for big data analysis. We have tested, analysed and demonstrated their efficacy on selecting representative real databases publicly available at Frequent Itemset Mining Implementations Repository.</p>
|