Mining Frequent Itemsets from Uncertain Database

碩士 === 銘傳大學 === 資訊工程學系碩士班 === 101 === Mining frequent itemsets from the transaction database is order to find often purchased the combination of products. In other words, contain this itemset transactions reached a user-defined threshold and the transaction record of the transaction purchase those p...

Full description

Bibliographic Details
Main Authors: Bei-Chuan Yang, 楊倍權
Other Authors: Show-Jane Yen
Format: Others
Language:zh-TW
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/22190984293586456359
Description
Summary:碩士 === 銘傳大學 === 資訊工程學系碩士班 === 101 === Mining frequent itemsets from the transaction database is order to find often purchased the combination of products. In other words, contain this itemset transactions reached a user-defined threshold and the transaction record of the transaction purchase those product items. We can combine frequent itemset for promotional merchandise to achieve the purpose of increasing sales. Mining frequent itemsets currently most efficient way is to use FP-Tree structure and the transaction database only records whether the product was purchased. However, in some applications, we cannot determine whether the event occurred. For example, at hospital’s diagnostic records database, the doctor will diagnose the patient may have symptoms of what.A data represents the patient may be suffering from what disease symptoms.Each symptoms recorded physician consider possibility, it is usually use a probability value to represent.This type of database called uncertain databases.If a set of items appearing in an uncertain probability of a database reaches a user-defined threshold.This represents the itemset''s items have high probability occur simultaneously.This itemset is called frequent itemset. For example, if thecoughandrunny nosewhilethe probability of occurrenceis high,when apatient hassymptoms of cough, we can determinewhichwill beaccompanied bya runny nosesymptoms, the doctor torefer you to dopreventativedosing. However, it is difficult to find useful information by setting thresholds to data mining.It might have toset differentthresholdsto find theportfolio of projectsmeetuser requirements.However, it is waste time to re-establish FP-Tree when the threshold is been reset.So this paper we propose a method by using similar FP-Tree structure from the database to find uncertain frequent itemsets.If the threshold has changed, our method does not re-create the FP-Tree can directly update the previously identified frequent itemsets.But if without consider thresholds to build the FP-Tree, the database must store all the items.In order to reduceFP-Treeoccupied by thestorage space,we also compression FP-Tree,and directly from the compressed FP-Tree to mine frequent itemsets.