A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset

Classification of diabetes data with existing data mining and machine learning algorithms is challenging and the predictions are not always accurate. We aim to build a model that effectively addresses these challenges (misclassification) and can accurately diagnose and classify diabetes. In this stu...

Full description

Bibliographic Details
Main Author: Masoud M. Hassan
Format: Article
Language:English
Published: University of Zakho 2020-09-01
Series:Science Journal of University of Zakho
Subjects:
Online Access:https://sjuoz.uoz.edu.krd/index.php/sjuoz/article/view/707
id doaj-3a87adeb0f77458386eeb70c7a033604
record_format Article
spelling doaj-3a87adeb0f77458386eeb70c7a0336042020-11-25T03:27:42Zeng University of ZakhoScience Journal of University of Zakho2663-628X2663-62982020-09-018310.25271/sjuoz.2020.8.3.707A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes DatasetMasoud M. Hassan0Dept. of Computer Science, Faculty of Science, University of Zakho, Kurdistan Region, Iraq.Classification of diabetes data with existing data mining and machine learning algorithms is challenging and the predictions are not always accurate. We aim to build a model that effectively addresses these challenges (misclassification) and can accurately diagnose and classify diabetes. In this study, we investigated the use of Bayesian Logistic Regression (BLR) for mining such data to diagnose and classify various diabetes conditions. This approach is fully Bayesian suited for automating Markov Chain Monte Carlo (MCMC) simulation. Using Bayesian methods in analysing medical data is useful because of the rich hierarchical models, uncertainty quantification, and prior information they provide. The analysis was done on a real medical dataset created for 909 patients in Zakho city with a binary class label and seven independent variables. Three different prior distributions (Gaussian, Laplace and Cauchy) were investigated for our proposed model implemented by MCMC. The performance and behaviour of the Bayesian approach were illustrated and compared with the traditional classification algorithms on this dataset using 10-fold cross-validation. Experimental results show overall that classification under BLR with informative Gaussian priors performed better in terms of various accuracy metrics. It provides an accuracy of 92.53%, a recall of 94.85%, a precision of 91.42% and an F1 score of 93.11%. Experimental results suggest that it is worthwhile to explore the application of BLR to predictive modelling tasks in medical studies using informative prior distributions. https://sjuoz.uoz.edu.krd/index.php/sjuoz/article/view/707DiabetesBayesian Logistic RegressionMarkov Chain Monte CarloClassificationInformative Priors
collection DOAJ
language English
format Article
sources DOAJ
author Masoud M. Hassan
spellingShingle Masoud M. Hassan
A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset
Science Journal of University of Zakho
Diabetes
Bayesian Logistic Regression
Markov Chain Monte Carlo
Classification
Informative Priors
author_facet Masoud M. Hassan
author_sort Masoud M. Hassan
title A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset
title_short A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset
title_full A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset
title_fullStr A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset
title_full_unstemmed A Fully Bayesian Logistic Regression Model for Classification of ZADA Diabetes Dataset
title_sort fully bayesian logistic regression model for classification of zada diabetes dataset
publisher University of Zakho
series Science Journal of University of Zakho
issn 2663-628X
2663-6298
publishDate 2020-09-01
description Classification of diabetes data with existing data mining and machine learning algorithms is challenging and the predictions are not always accurate. We aim to build a model that effectively addresses these challenges (misclassification) and can accurately diagnose and classify diabetes. In this study, we investigated the use of Bayesian Logistic Regression (BLR) for mining such data to diagnose and classify various diabetes conditions. This approach is fully Bayesian suited for automating Markov Chain Monte Carlo (MCMC) simulation. Using Bayesian methods in analysing medical data is useful because of the rich hierarchical models, uncertainty quantification, and prior information they provide. The analysis was done on a real medical dataset created for 909 patients in Zakho city with a binary class label and seven independent variables. Three different prior distributions (Gaussian, Laplace and Cauchy) were investigated for our proposed model implemented by MCMC. The performance and behaviour of the Bayesian approach were illustrated and compared with the traditional classification algorithms on this dataset using 10-fold cross-validation. Experimental results show overall that classification under BLR with informative Gaussian priors performed better in terms of various accuracy metrics. It provides an accuracy of 92.53%, a recall of 94.85%, a precision of 91.42% and an F1 score of 93.11%. Experimental results suggest that it is worthwhile to explore the application of BLR to predictive modelling tasks in medical studies using informative prior distributions.
topic Diabetes
Bayesian Logistic Regression
Markov Chain Monte Carlo
Classification
Informative Priors
url https://sjuoz.uoz.edu.krd/index.php/sjuoz/article/view/707
work_keys_str_mv AT masoudmhassan afullybayesianlogisticregressionmodelforclassificationofzadadiabetesdataset
AT masoudmhassan fullybayesianlogisticregressionmodelforclassificationofzadadiabetesdataset
_version_ 1724587645769613312