Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa
Abstract Aim HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well a...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-07-01
|
Series: | BMC Medical Research Methodology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12874-021-01346-2 |
id |
doaj-5c43861655044bd2bf4cb4d83d6ef83b |
---|---|
record_format |
Article |
spelling |
doaj-5c43861655044bd2bf4cb4d83d6ef83b2021-08-01T11:43:46ZengBMCBMC Medical Research Methodology1471-22882021-07-0121111110.1186/s12874-021-01346-2Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan AfricaCharles K. Mutai0Patrick E. McSharry1Innocent Ngaruye2Edouard Musabanganji3African Center of Excellence in Data Science, University of RwandaAfrican Center of Excellence in Data Science, University of RwandaCollege of Science and Technology, University of RwandaCollege of Business and Economics, University of RwandaAbstract Aim HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection. Method We applied machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 30 and 40 variables respectively from four countries in sub-Saharan countries. We trained and validated the algorithms on 80% of the data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease. Results Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other five algorithms by f1 scoring mean of 90% and 92% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and 19 females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of infection. Conclusion Our findings provide a potential use of the XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening.https://doi.org/10.1186/s12874-021-01346-2Socio-behavioralScreeningHigh-riskPredictorsXGBoost |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Charles K. Mutai Patrick E. McSharry Innocent Ngaruye Edouard Musabanganji |
spellingShingle |
Charles K. Mutai Patrick E. McSharry Innocent Ngaruye Edouard Musabanganji Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa BMC Medical Research Methodology Socio-behavioral Screening High-risk Predictors XGBoost |
author_facet |
Charles K. Mutai Patrick E. McSharry Innocent Ngaruye Edouard Musabanganji |
author_sort |
Charles K. Mutai |
title |
Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title_short |
Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title_full |
Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title_fullStr |
Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title_full_unstemmed |
Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa |
title_sort |
use of machine learning techniques to identify hiv predictors for screening in sub-saharan africa |
publisher |
BMC |
series |
BMC Medical Research Methodology |
issn |
1471-2288 |
publishDate |
2021-07-01 |
description |
Abstract Aim HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection. Method We applied machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 30 and 40 variables respectively from four countries in sub-Saharan countries. We trained and validated the algorithms on 80% of the data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease. Results Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other five algorithms by f1 scoring mean of 90% and 92% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and 19 females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of infection. Conclusion Our findings provide a potential use of the XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening. |
topic |
Socio-behavioral Screening High-risk Predictors XGBoost |
url |
https://doi.org/10.1186/s12874-021-01346-2 |
work_keys_str_mv |
AT charleskmutai useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica AT patrickemcsharry useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica AT innocentngaruye useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica AT edouardmusabanganji useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica |
_version_ |
1721245559365304320 |