Data Mining of Medical Datasets with Missing Attributes from Different Sources

Bibliographic Details
Main Author: Sajja, Sunitha
Language:English
Published: Youngstown State University / OhioLINK 2010
Subjects:
Online Access:http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263
id ndltd-OhioLink-oai-etd.ohiolink.edu-ysu1300298263
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-ysu13002982632021-08-03T06:18:11Z Data Mining of Medical Datasets with Missing Attributes from Different Sources Sajja, Sunitha Computer Science data mining missing attributes data classification outliers Two major problems in data mining are 1) dealing with missing values in the datasets used for knowledge discovery, and 2) using one data set as a predictor of other datasets. We explore this problem using four different datasets from the UCI Machine learning repository, from four different sources with different missing values. Each dataset contains 13 attributes and one class attribute which denotes the presence of heart disease and the absence of heart disease. Missing values were replaced in a number of ways; first by using normal mean and mode method, secondly by removing the attributes that contains missing values, thirdly by removing the records that contains more than 60 percent of values missing and filling the remaining missing values. We also experimented with different classification techniques, including Decision tree, Naive Bayes, and MultiLayerPerceptron, using Medical Datasets. Rapid Miner and Weka tools. The consistency of the datasets was found by combining the datasets together and comparing the results of this datasets with the classification error of different datasets. It can be seen from the results that if fewer number of missing values are present, the normal mean and mode method is good. If larger amount of missing values are present than removing instances that contain 60% of missing values and replacing with remaining along with different preprocessing steps works better, and using one dataset as a predictor of other dataset produced moderate accuracy. 2010 English text Youngstown State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263 http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Computer Science
data mining
missing attributes
data classification
outliers
spellingShingle Computer Science
data mining
missing attributes
data classification
outliers
Sajja, Sunitha
Data Mining of Medical Datasets with Missing Attributes from Different Sources
author Sajja, Sunitha
author_facet Sajja, Sunitha
author_sort Sajja, Sunitha
title Data Mining of Medical Datasets with Missing Attributes from Different Sources
title_short Data Mining of Medical Datasets with Missing Attributes from Different Sources
title_full Data Mining of Medical Datasets with Missing Attributes from Different Sources
title_fullStr Data Mining of Medical Datasets with Missing Attributes from Different Sources
title_full_unstemmed Data Mining of Medical Datasets with Missing Attributes from Different Sources
title_sort data mining of medical datasets with missing attributes from different sources
publisher Youngstown State University / OhioLINK
publishDate 2010
url http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263
work_keys_str_mv AT sajjasunitha dataminingofmedicaldatasetswithmissingattributesfromdifferentsources
_version_ 1719434322798182400