Data Mining of Medical Datasets with Missing Attributes from Different Sources

Bibliographic Details
Main Author:	Sajja, Sunitha
Language:	English
Published:	Youngstown State University / OhioLINK 2010
Subjects:	Computer Science data mining missing attributes data classification outliers
Online Access:	http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263

id	ndltd-OhioLink-oai-etd.ohiolink.edu-ysu1300298263
record_format	oai_dc
spelling	ndltd-OhioLink-oai-etd.ohiolink.edu-ysu13002982632021-08-03T06:18:11Z Data Mining of Medical Datasets with Missing Attributes from Different Sources Sajja, Sunitha Computer Science data mining missing attributes data classification outliers Two major problems in data mining are 1) dealing with missing values in the datasets used for knowledge discovery, and 2) using one data set as a predictor of other datasets. We explore this problem using four different datasets from the UCI Machine learning repository, from four different sources with different missing values. Each dataset contains 13 attributes and one class attribute which denotes the presence of heart disease and the absence of heart disease. Missing values were replaced in a number of ways; first by using normal mean and mode method, secondly by removing the attributes that contains missing values, thirdly by removing the records that contains more than 60 percent of values missing and filling the remaining missing values. We also experimented with different classification techniques, including Decision tree, Naive Bayes, and MultiLayerPerceptron, using Medical Datasets. Rapid Miner and Weka tools. The consistency of the datasets was found by combining the datasets together and comparing the results of this datasets with the classification error of different datasets. It can be seen from the results that if fewer number of missing values are present, the normal mean and mode method is good. If larger amount of missing values are present than removing instances that contain 60% of missing values and replacing with remaining along with different preprocessing steps works better, and using one dataset as a predictor of other dataset produced moderate accuracy. 2010 English text Youngstown State University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263 http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection	NDLTD
language	English
sources	NDLTD
topic	Computer Science data mining missing attributes data classification outliers
spellingShingle	Computer Science data mining missing attributes data classification outliers Sajja, Sunitha Data Mining of Medical Datasets with Missing Attributes from Different Sources
author	Sajja, Sunitha
author_facet	Sajja, Sunitha
author_sort	Sajja, Sunitha
title	Data Mining of Medical Datasets with Missing Attributes from Different Sources
title_short	Data Mining of Medical Datasets with Missing Attributes from Different Sources
title_full	Data Mining of Medical Datasets with Missing Attributes from Different Sources
title_fullStr	Data Mining of Medical Datasets with Missing Attributes from Different Sources
title_full_unstemmed	Data Mining of Medical Datasets with Missing Attributes from Different Sources
title_sort	data mining of medical datasets with missing attributes from different sources
publisher	Youngstown State University / OhioLINK
publishDate	2010
url	http://rave.ohiolink.edu/etdc/view?acc_num=ysu1300298263
work_keys_str_mv	AT sajjasunitha dataminingofmedicaldatasetswithmissingattributesfromdifferentsources
_version_	1719434322798182400

Data Mining of Medical Datasets with Missing Attributes from Different Sources

Similar Items