Data Mining of Medical Datasets with Missing Attributes from Different Sources
Main Author: | |
---|---|
Language: | English |
Published: |
Youngstown State University / OhioLINK
2010
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-ysu1300298263 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-ysu13002982632021-08-03T06:18:11Z Data Mining of Medical Datasets with Missing Attributes from Different Sources Sajja, Sunitha Computer Science data mining missing attributes data classification outliers Two major problems in data mining are 1) dealing with missing values in the datasets used for knowledge discovery, and 2) using one data set as a predictor of other datasets. We explore this problem using four different datasets from the UCI Machine learning repository, from four different sources with different missing values. Each dataset contains 13 attributes and one class attribute which denotes the presence of heart disease and the absence of heart disease. Missing values were replaced in a number of ways; first by using normal mean and mode method, secondly by removing the attributes that contains missing values, thirdly by removing the records that contains more than 60 percent of values missing and filling the remaining missing values. We also experimented with different classification techniques, including Decision tree, Naive Bayes, and MultiLayerPerceptron, using Medical Datasets. Rapid Miner and Weka tools. The consistency of the datasets was found by combining the datasets together and comparing the results of this datasets with the classification error of different datasets. It can be seen from the results that if fewer number of missing values are present, the normal mean and mode method is good. If larger amount of missing values are present than removing instances that contain 60% of missing values and replacing with remaining along with different preprocessing steps works better, and using one dataset as a predictor of other dataset produced moderate accuracy. 2010 English text Youngstown State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263 http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Computer Science data mining missing attributes data classification outliers |
spellingShingle |
Computer Science data mining missing attributes data classification outliers Sajja, Sunitha Data Mining of Medical Datasets with Missing Attributes from Different Sources |
author |
Sajja, Sunitha |
author_facet |
Sajja, Sunitha |
author_sort |
Sajja, Sunitha |
title |
Data Mining of Medical Datasets with Missing Attributes from Different Sources |
title_short |
Data Mining of Medical Datasets with Missing Attributes from Different Sources |
title_full |
Data Mining of Medical Datasets with Missing Attributes from Different Sources |
title_fullStr |
Data Mining of Medical Datasets with Missing Attributes from Different Sources |
title_full_unstemmed |
Data Mining of Medical Datasets with Missing Attributes from Different Sources |
title_sort |
data mining of medical datasets with missing attributes from different sources |
publisher |
Youngstown State University / OhioLINK |
publishDate |
2010 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263 |
work_keys_str_mv |
AT sajjasunitha dataminingofmedicaldatasetswithmissingattributesfromdifferentsources |
_version_ |
1719434322798182400 |