Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease.
Cardiovascular disease (CVD) is the leading cause of morbidity, mortality, premature death and reduced quality of life for the citizens of the EU. It has been reported that CVD represents a major economic load on health care sys- tems in terms of hospitalizations, rehabilitation services, physician...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
KTH, Skolan för kemi, bioteknologi och hälsa (CBH)
2018
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233978 |
id |
ndltd-UPSALLA1-oai-DiVA.org-kth-233978 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-kth-2339782018-09-06T06:21:51ZKnowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease.engKnowledge Discovery och Data mining med hjälp av demografiska och kliniska data för att diagnostisera hjärtsjukdomar.Fernandez Sanchez, JavierKTH, Skolan för kemi, bioteknologi och hälsa (CBH)2018machine learningdata scienceartificial intelligencedata miningcardiovascular diseaseCVDexploratory analysisEDAclinical datasupport vector machinespreprocessingdecision treeslogistic regressionKNNadaboostxgboostrandom foresthealthhealthcareMedical EngineeringMedicinteknikCardiovascular disease (CVD) is the leading cause of morbidity, mortality, premature death and reduced quality of life for the citizens of the EU. It has been reported that CVD represents a major economic load on health care sys- tems in terms of hospitalizations, rehabilitation services, physician visits and medication. Data Mining techniques with clinical data has become an interesting tool to prevent, diagnose or treat CVD. In this thesis, Knowledge Dis- covery and Data Mining (KDD) was employed to analyse clinical and demographic data, which could be used to diagnose coronary artery disease (CAD). The exploratory data analysis (EDA) showed that female patients at an el- derly age with a higher level of cholesterol, maximum achieved heart rate and ST-depression are more prone to be diagnosed with heart disease. Furthermore, patients with atypical angina are more likely to be at an elderly age with a slightly higher level of cholesterol and maximum achieved heart rate than asymptotic chest pain patients. More- over, patients with exercise induced angina contained lower values of maximum achieved heart rate than those who do not experience it. We could verify that patients who experience exercise induced angina and asymptomatic chest pain are more likely to be diagnosed with heart disease. On the other hand, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Decision Tree, Bagging and Boosting methods were evaluated by adopting a stratified 10 fold cross-validation approach. The learning models provided an average of 78-83% F-score and a mean AUC of 85-88%. Among all the models, the highest score is given by Radial Basis Function Kernel Support Vector Machines (RBF-SVM), achieving 82.5% ± 4.7% of F-score and an AUC of 87.6% ± 5.8%. Our research con- firmed that data mining techniques can support physicians in their interpretations of heart disease diagnosis in addition to clinical and demographic characteristics of patients. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233978TRITA-CBH-GRU ; 2018:80application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
machine learning data science artificial intelligence data mining cardiovascular disease CVD exploratory analysis EDA clinical data support vector machines preprocessing decision trees logistic regression KNN adaboost xgboost random forest health healthcare Medical Engineering Medicinteknik |
spellingShingle |
machine learning data science artificial intelligence data mining cardiovascular disease CVD exploratory analysis EDA clinical data support vector machines preprocessing decision trees logistic regression KNN adaboost xgboost random forest health healthcare Medical Engineering Medicinteknik Fernandez Sanchez, Javier Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease. |
description |
Cardiovascular disease (CVD) is the leading cause of morbidity, mortality, premature death and reduced quality of life for the citizens of the EU. It has been reported that CVD represents a major economic load on health care sys- tems in terms of hospitalizations, rehabilitation services, physician visits and medication. Data Mining techniques with clinical data has become an interesting tool to prevent, diagnose or treat CVD. In this thesis, Knowledge Dis- covery and Data Mining (KDD) was employed to analyse clinical and demographic data, which could be used to diagnose coronary artery disease (CAD). The exploratory data analysis (EDA) showed that female patients at an el- derly age with a higher level of cholesterol, maximum achieved heart rate and ST-depression are more prone to be diagnosed with heart disease. Furthermore, patients with atypical angina are more likely to be at an elderly age with a slightly higher level of cholesterol and maximum achieved heart rate than asymptotic chest pain patients. More- over, patients with exercise induced angina contained lower values of maximum achieved heart rate than those who do not experience it. We could verify that patients who experience exercise induced angina and asymptomatic chest pain are more likely to be diagnosed with heart disease. On the other hand, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Decision Tree, Bagging and Boosting methods were evaluated by adopting a stratified 10 fold cross-validation approach. The learning models provided an average of 78-83% F-score and a mean AUC of 85-88%. Among all the models, the highest score is given by Radial Basis Function Kernel Support Vector Machines (RBF-SVM), achieving 82.5% ± 4.7% of F-score and an AUC of 87.6% ± 5.8%. Our research con- firmed that data mining techniques can support physicians in their interpretations of heart disease diagnosis in addition to clinical and demographic characteristics of patients. |
author |
Fernandez Sanchez, Javier |
author_facet |
Fernandez Sanchez, Javier |
author_sort |
Fernandez Sanchez, Javier |
title |
Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease. |
title_short |
Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease. |
title_full |
Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease. |
title_fullStr |
Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease. |
title_full_unstemmed |
Knowledge Discovery and Data Mining Using Demographic and Clinical Data to Diagnose Heart Disease. |
title_sort |
knowledge discovery and data mining using demographic and clinical data to diagnose heart disease. |
publisher |
KTH, Skolan för kemi, bioteknologi och hälsa (CBH) |
publishDate |
2018 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233978 |
work_keys_str_mv |
AT fernandezsanchezjavier knowledgediscoveryanddataminingusingdemographicandclinicaldatatodiagnoseheartdisease AT fernandezsanchezjavier knowledgediscoveryochdataminingmedhjalpavdemografiskaochkliniskadataforattdiagnostiserahjartsjukdomar |
_version_ |
1718731296114475008 |