Data Mining with Uncertain Data

碩士 === 國立高雄大學 === 電機工程學系碩士班 === 97 === Machine learning and data mining are two kinds of important techniques for extracting valuable information from datasets. Although current mining and learning technologies can handle large amounts of data, the rapid growth of datasets may cause some attribute v...

Full description

Bibliographic Details
Main Authors: Chih-Wei Wu, 吳志偉
Other Authors: Tzung-Pei Hong
Format: Others
Language:en_US
Published: 2009
Online Access:http://ndltd.ncl.edu.tw/handle/67289209055615848501
id ndltd-TW-097NUK05442037
record_format oai_dc
spelling ndltd-TW-097NUK054420372016-06-22T04:13:45Z http://ndltd.ncl.edu.tw/handle/67289209055615848501 Data Mining with Uncertain Data 不明確資料之資料挖掘 Chih-Wei Wu 吳志偉 碩士 國立高雄大學 電機工程學系碩士班 97 Machine learning and data mining are two kinds of important techniques for extracting valuable information from datasets. Although current mining and learning technologies can handle large amounts of data, the rapid growth of datasets may cause some attribute values to be missed in the data-gathering process. Incomplete data are usually appropriately handled to improve the quality of the discovered information. Therefore, the problem of recovering missing values from a data set has become an important research issue in the field of data mining and machine learning. In this thesis, we first introduce an iterative missing-value completion method based on the RAR support values to extract useful association rules for inferring missing values in an iterative way. The proposed method can fully infer the missing attribute values by combining an iterative mechanism and data mining techniques. It consists of three phases. The first phase uses the association rules to roughly complete the missing values. The second phase iteratively reduces the minimum support to gather more association rules to complete the rest of missing values. The third phase uses the association rules from the completed dataset to correct the missing values that have been filled in. The proposed approach is then a little modified to consider the partial support values in deriving missing values. The second approach is a little better than the first one because the former uses more information (incomplete tuples) in guessing. Experimental results show both the proposed approaches have good accuracy and data recovery even when the missing-value rate is high. Tzung-Pei Hong 洪宗貝 2009 學位論文 ; thesis 68 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立高雄大學 === 電機工程學系碩士班 === 97 === Machine learning and data mining are two kinds of important techniques for extracting valuable information from datasets. Although current mining and learning technologies can handle large amounts of data, the rapid growth of datasets may cause some attribute values to be missed in the data-gathering process. Incomplete data are usually appropriately handled to improve the quality of the discovered information. Therefore, the problem of recovering missing values from a data set has become an important research issue in the field of data mining and machine learning. In this thesis, we first introduce an iterative missing-value completion method based on the RAR support values to extract useful association rules for inferring missing values in an iterative way. The proposed method can fully infer the missing attribute values by combining an iterative mechanism and data mining techniques. It consists of three phases. The first phase uses the association rules to roughly complete the missing values. The second phase iteratively reduces the minimum support to gather more association rules to complete the rest of missing values. The third phase uses the association rules from the completed dataset to correct the missing values that have been filled in. The proposed approach is then a little modified to consider the partial support values in deriving missing values. The second approach is a little better than the first one because the former uses more information (incomplete tuples) in guessing. Experimental results show both the proposed approaches have good accuracy and data recovery even when the missing-value rate is high.
author2 Tzung-Pei Hong
author_facet Tzung-Pei Hong
Chih-Wei Wu
吳志偉
author Chih-Wei Wu
吳志偉
spellingShingle Chih-Wei Wu
吳志偉
Data Mining with Uncertain Data
author_sort Chih-Wei Wu
title Data Mining with Uncertain Data
title_short Data Mining with Uncertain Data
title_full Data Mining with Uncertain Data
title_fullStr Data Mining with Uncertain Data
title_full_unstemmed Data Mining with Uncertain Data
title_sort data mining with uncertain data
publishDate 2009
url http://ndltd.ncl.edu.tw/handle/67289209055615848501
work_keys_str_mv AT chihweiwu dataminingwithuncertaindata
AT wúzhìwěi dataminingwithuncertaindata
AT chihweiwu bùmíngquèzīliàozhīzīliàowājué
AT wúzhìwěi bùmíngquèzīliàozhīzīliàowājué
_version_ 1718314175081480192