A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset
Classification of diabetes data with existing data mining and machine learning algorithms is challenging and the predictions are not always accurate. We aim to build a model that effectively addresses these challenges (misclassification) and can accurately diagnose and classify diabetes. In this stu...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
University of Zakho
2020-09-01
|
Series: | Science Journal of University of Zakho |
Subjects: | |
Online Access: | https://sjuoz.uoz.edu.krd/index.php/sjuoz/article/view/707 |
id |
doaj-3a87adeb0f77458386eeb70c7a033604 |
---|---|
record_format |
Article |
spelling |
doaj-3a87adeb0f77458386eeb70c7a0336042020-11-25T03:27:42Zeng University of ZakhoScience Journal of University of Zakho2663-628X2663-62982020-09-018310.25271/sjuoz.2020.8.3.707A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes DatasetMasoud M. Hassan0Dept. of Computer Science, Faculty of Science, University of Zakho, Kurdistan Region, Iraq.Classification of diabetes data with existing data mining and machine learning algorithms is challenging and the predictions are not always accurate. We aim to build a model that effectively addresses these challenges (misclassification) and can accurately diagnose and classify diabetes. In this study, we investigated the use of Bayesian Logistic Regression (BLR) for mining such data to diagnose and classify various diabetes conditions. This approach is fully Bayesian suited for automating Markov Chain Monte Carlo (MCMC) simulation. Using Bayesian methods in analysing medical data is useful because of the rich hierarchical models, uncertainty quantification, and prior information they provide. The analysis was done on a real medical dataset created for 909 patients in Zakho city with a binary class label and seven independent variables. Three different prior distributions (Gaussian, Laplace and Cauchy) were investigated for our proposed model implemented by MCMC. The performance and behaviour of the Bayesian approach were illustrated and compared with the traditional classification algorithms on this dataset using 10-fold cross-validation. Experimental results show overall that classification under BLR with informative Gaussian priors performed better in terms of various accuracy metrics. It provides an accuracy of 92.53%, a recall of 94.85%, a precision of 91.42% and an F1 score of 93.11%. Experimental results suggest that it is worthwhile to explore the application of BLR to predictive modelling tasks in medical studies using informative prior distributions. https://sjuoz.uoz.edu.krd/index.php/sjuoz/article/view/707DiabetesBayesian Logistic RegressionMarkov Chain Monte CarloClassificationInformative Priors |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Masoud M. Hassan |
spellingShingle |
Masoud M. Hassan A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset Science Journal of University of Zakho Diabetes Bayesian Logistic Regression Markov Chain Monte Carlo Classification Informative Priors |
author_facet |
Masoud M. Hassan |
author_sort |
Masoud M. Hassan |
title |
A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset |
title_short |
A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset |
title_full |
A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset |
title_fullStr |
A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset |
title_full_unstemmed |
A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset |
title_sort |
fully bayesian logistic regression model for classification of zada diabetes dataset |
publisher |
University of Zakho |
series |
Science Journal of University of Zakho |
issn |
2663-628X 2663-6298 |
publishDate |
2020-09-01 |
description |
Classification of diabetes data with existing data mining and machine learning algorithms is challenging and the predictions are not always accurate. We aim to build a model that effectively addresses these challenges (misclassification) and can accurately diagnose and classify diabetes. In this study, we investigated the use of Bayesian Logistic Regression (BLR) for mining such data to diagnose and classify various diabetes conditions. This approach is fully Bayesian suited for automating Markov Chain Monte Carlo (MCMC) simulation. Using Bayesian methods in analysing medical data is useful because of the rich hierarchical models, uncertainty quantification, and prior information they provide. The analysis was done on a real medical dataset created for 909 patients in Zakho city with a binary class label and seven independent variables. Three different prior distributions (Gaussian, Laplace and Cauchy) were investigated for our proposed model implemented by MCMC. The performance and behaviour of the Bayesian approach were illustrated and compared with the traditional classification algorithms on this dataset using 10-fold cross-validation. Experimental results show overall that classification under BLR with informative Gaussian priors performed better in terms of various accuracy metrics. It provides an accuracy of 92.53%, a recall of 94.85%, a precision of 91.42% and an F1 score of 93.11%. Experimental results suggest that it is worthwhile to explore the application of BLR to predictive modelling tasks in medical studies using informative prior distributions.
|
topic |
Diabetes Bayesian Logistic Regression Markov Chain Monte Carlo Classification Informative Priors |
url |
https://sjuoz.uoz.edu.krd/index.php/sjuoz/article/view/707 |
work_keys_str_mv |
AT masoudmhassan afullybayesianlogisticregressionmodelforclassificationofzadadiabetesdataset AT masoudmhassan fullybayesianlogisticregressionmodelforclassificationofzadadiabetesdataset |
_version_ |
1724587645769613312 |