Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease.

Cardiovascular disease (CVD) is the leading cause of morbidity, mortality, premature death and reduced quality of life for the citizens of the EU. It has been reported that CVD represents a major economic load on health care sys- tems in terms of hospitalizations, rehabilitation services, physician...

Full description

Bibliographic Details
Main Author: Fernandez Sanchez, Javier
Format: Others
Language:English
Published: KTH, Skolan för kemi, bioteknologi och hälsa (CBH) 2018
Subjects:
CVD
EDA
KNN
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233978
id ndltd-UPSALLA1-oai-DiVA.org-kth-233978
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-kth-2339782018-09-06T06:21:51ZKnowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease.engKnowledge Discovery och Data mining med hjälp av demografiska och kliniska data för att diagnostisera hjärtsjukdomar.Fernandez Sanchez, JavierKTH, Skolan för kemi, bioteknologi och hälsa (CBH)2018machine learningdata scienceartificial intelligencedata miningcardiovascular diseaseCVDexploratory analysisEDAclinical datasupport vector machinespreprocessingdecision treeslogistic regressionKNNadaboostxgboostrandom foresthealthhealthcareMedical EngineeringMedicinteknikCardiovascular disease (CVD) is the leading cause of morbidity, mortality, premature death and reduced quality of life for the citizens of the EU. It has been reported that CVD represents a major economic load on health care sys- tems in terms of hospitalizations, rehabilitation services, physician visits and medication. Data Mining techniques with clinical data has become an interesting tool to prevent, diagnose or treat CVD. In this thesis, Knowledge Dis- covery and Data Mining (KDD) was employed to analyse clinical and demographic data, which could be used to diagnose coronary artery disease (CAD). The exploratory data analysis (EDA) showed that female patients at an el- derly age with a higher level of cholesterol, maximum achieved heart rate and ST-depression are more prone to be diagnosed with heart disease. Furthermore, patients with atypical angina are more likely to be at an elderly age with a slightly higher level of cholesterol and maximum achieved heart rate than asymptotic chest pain patients. More- over, patients with exercise induced angina contained lower values of maximum achieved heart rate than those who do not experience it. We could verify that patients who experience exercise induced angina and asymptomatic chest pain are more likely to be diagnosed with heart disease. On the other hand, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Decision Tree, Bagging and Boosting methods were evaluated by adopting a stratified 10 fold cross-validation approach. The learning models provided an average of 78-83% F-score and a mean AUC of 85-88%. Among all the models, the highest score is given by Radial Basis Function Kernel Support Vector Machines (RBF-SVM), achieving 82.5% ± 4.7% of F-score and an AUC of 87.6% ± 5.8%. Our research con- firmed that data mining techniques can support physicians in their interpretations of heart disease diagnosis in addition to clinical and demographic characteristics of patients. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233978TRITA-CBH-GRU ; 2018:80application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic machine learning
data science
artificial intelligence
data mining
cardiovascular disease
CVD
exploratory analysis
EDA
clinical data
support vector machines
preprocessing
decision trees
logistic regression
KNN
adaboost
xgboost
random forest
health
healthcare
Medical Engineering
Medicinteknik
spellingShingle machine learning
data science
artificial intelligence
data mining
cardiovascular disease
CVD
exploratory analysis
EDA
clinical data
support vector machines
preprocessing
decision trees
logistic regression
KNN
adaboost
xgboost
random forest
health
healthcare
Medical Engineering
Medicinteknik
Fernandez Sanchez, Javier
Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease.
description Cardiovascular disease (CVD) is the leading cause of morbidity, mortality, premature death and reduced quality of life for the citizens of the EU. It has been reported that CVD represents a major economic load on health care sys- tems in terms of hospitalizations, rehabilitation services, physician visits and medication. Data Mining techniques with clinical data has become an interesting tool to prevent, diagnose or treat CVD. In this thesis, Knowledge Dis- covery and Data Mining (KDD) was employed to analyse clinical and demographic data, which could be used to diagnose coronary artery disease (CAD). The exploratory data analysis (EDA) showed that female patients at an el- derly age with a higher level of cholesterol, maximum achieved heart rate and ST-depression are more prone to be diagnosed with heart disease. Furthermore, patients with atypical angina are more likely to be at an elderly age with a slightly higher level of cholesterol and maximum achieved heart rate than asymptotic chest pain patients. More- over, patients with exercise induced angina contained lower values of maximum achieved heart rate than those who do not experience it. We could verify that patients who experience exercise induced angina and asymptomatic chest pain are more likely to be diagnosed with heart disease. On the other hand, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Decision Tree, Bagging and Boosting methods were evaluated by adopting a stratified 10 fold cross-validation approach. The learning models provided an average of 78-83% F-score and a mean AUC of 85-88%. Among all the models, the highest score is given by Radial Basis Function Kernel Support Vector Machines (RBF-SVM), achieving 82.5% ± 4.7% of F-score and an AUC of 87.6% ± 5.8%. Our research con- firmed that data mining techniques can support physicians in their interpretations of heart disease diagnosis in addition to clinical and demographic characteristics of patients.
author Fernandez Sanchez, Javier
author_facet Fernandez Sanchez, Javier
author_sort Fernandez Sanchez, Javier
title Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease.
title_short Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease.
title_full Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease.
title_fullStr Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease.
title_full_unstemmed Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease.
title_sort knowledge discovery and data mining using demographic and clinical data to diagnose heart disease.
publisher KTH, Skolan för kemi, bioteknologi och hälsa (CBH)
publishDate 2018
url http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233978
work_keys_str_mv AT fernandezsanchezjavier knowledgediscoveryanddataminingusingdemographicandclinicaldatatodiagnoseheartdisease
AT fernandezsanchezjavier knowledgediscoveryochdataminingmedhjalpavdemografiskaochkliniskadataforattdiagnostiserahjartsjukdomar
_version_ 1718731296114475008