XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction
Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most si...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-09-01
|
Series: | International Journal of Environmental Research and Public Health |
Subjects: | |
Online Access: | https://www.mdpi.com/1660-4601/17/18/6513 |
id |
doaj-d4ea8bcce22849ca9a6db9dbcca6541a |
---|---|
record_format |
Article |
spelling |
doaj-d4ea8bcce22849ca9a6db9dbcca6541a2020-11-25T02:53:00ZengMDPI AGInternational Journal of Environmental Research and Public Health1661-78271660-46012020-09-01176513651310.3390/ijerph17186513XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease PredictionKhishigsuren Davagdorj0Van Huy Pham1Nipon Theera-Umpon2Keun Ho Ryu3Database and Bioinformatics Laboratory, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, KoreaFaculty of Information Technology, Ton Duc Thang University, Ho Chi Minh 700000, VietnamDepartment of Electrical Engineering, Faculty of Engineering, Chiang Mai University, Chiang Mai 50200, ThailandFaculty of Information Technology, Ton Duc Thang University, Ho Chi Minh 700000, VietnamSmoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most significant features and developing interpretable models are rather challenging in such systems. In this study, we propose an efficient extreme gradient boosting (XGBoost) based framework incorporated with the hybrid feature selection (HFS) method for SiNCDs prediction among the general population in South Korea and the United States. Initially, HFS is performed in three stages: (I) significant features are selected by t-test and chi-square test; (II) multicollinearity analysis serves to obtain dissimilar features; (III) final selection of best representative features is done based on least absolute shrinkage and selection operator (LASSO). Then, selected features are fed into the XGBoost predictive model. The experimental results show that our proposed model outperforms several existing baseline models. In addition, the proposed model also provides important features in order to enhance the interpretability of the SiNCDs prediction model. Consequently, the XGBoost based framework is expected to contribute for early diagnosis and prevention of the SiNCDs in public health concerns.https://www.mdpi.com/1660-4601/17/18/6513smokingnoncommunicable diseasefeature selectionextreme gradient boosting |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Khishigsuren Davagdorj Van Huy Pham Nipon Theera-Umpon Keun Ho Ryu |
spellingShingle |
Khishigsuren Davagdorj Van Huy Pham Nipon Theera-Umpon Keun Ho Ryu XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction International Journal of Environmental Research and Public Health smoking noncommunicable disease feature selection extreme gradient boosting |
author_facet |
Khishigsuren Davagdorj Van Huy Pham Nipon Theera-Umpon Keun Ho Ryu |
author_sort |
Khishigsuren Davagdorj |
title |
XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction |
title_short |
XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction |
title_full |
XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction |
title_fullStr |
XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction |
title_full_unstemmed |
XGBoost-Based Framework for Smoking-Induced Noncommunicable Disease Prediction |
title_sort |
xgboost-based framework for smoking-induced noncommunicable disease prediction |
publisher |
MDPI AG |
series |
International Journal of Environmental Research and Public Health |
issn |
1661-7827 1660-4601 |
publishDate |
2020-09-01 |
description |
Smoking-induced noncommunicable diseases (SiNCDs) have become a significant threat to public health and cause of death globally. In the last decade, numerous studies have been proposed using artificial intelligence techniques to predict the risk of developing SiNCDs. However, determining the most significant features and developing interpretable models are rather challenging in such systems. In this study, we propose an efficient extreme gradient boosting (XGBoost) based framework incorporated with the hybrid feature selection (HFS) method for SiNCDs prediction among the general population in South Korea and the United States. Initially, HFS is performed in three stages: (I) significant features are selected by t-test and chi-square test; (II) multicollinearity analysis serves to obtain dissimilar features; (III) final selection of best representative features is done based on least absolute shrinkage and selection operator (LASSO). Then, selected features are fed into the XGBoost predictive model. The experimental results show that our proposed model outperforms several existing baseline models. In addition, the proposed model also provides important features in order to enhance the interpretability of the SiNCDs prediction model. Consequently, the XGBoost based framework is expected to contribute for early diagnosis and prevention of the SiNCDs in public health concerns. |
topic |
smoking noncommunicable disease feature selection extreme gradient boosting |
url |
https://www.mdpi.com/1660-4601/17/18/6513 |
work_keys_str_mv |
AT khishigsurendavagdorj xgboostbasedframeworkforsmokinginducednoncommunicablediseaseprediction AT vanhuypham xgboostbasedframeworkforsmokinginducednoncommunicablediseaseprediction AT nipontheeraumpon xgboostbasedframeworkforsmokinginducednoncommunicablediseaseprediction AT keunhoryu xgboostbasedframeworkforsmokinginducednoncommunicablediseaseprediction |
_version_ |
1724727280352100352 |