Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa

Abstract Aim HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well a...

Full description

Bibliographic Details
Main Authors: Charles K. Mutai, Patrick E. McSharry, Innocent Ngaruye, Edouard Musabanganji
Format: Article
Language:English
Published: BMC 2021-07-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-021-01346-2
id doaj-5c43861655044bd2bf4cb4d83d6ef83b
record_format Article
spelling doaj-5c43861655044bd2bf4cb4d83d6ef83b2021-08-01T11:43:46ZengBMCBMC Medical Research Methodology1471-22882021-07-0121111110.1186/s12874-021-01346-2Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan AfricaCharles K. Mutai0Patrick E. McSharry1Innocent Ngaruye2Edouard Musabanganji3African Center of Excellence in Data Science, University of RwandaAfrican Center of Excellence in Data Science, University of RwandaCollege of Science and Technology, University of RwandaCollege of Business and Economics, University of RwandaAbstract Aim HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection. Method We applied machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 30 and 40 variables respectively from four countries in sub-Saharan countries. We trained and validated the algorithms on 80% of the data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease. Results Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other five algorithms by f1 scoring mean of 90% and 92% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and 19 females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of infection. Conclusion Our findings provide a potential use of the XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening.https://doi.org/10.1186/s12874-021-01346-2Socio-behavioralScreeningHigh-riskPredictorsXGBoost
collection DOAJ
language English
format Article
sources DOAJ
author Charles K. Mutai
Patrick E. McSharry
Innocent Ngaruye
Edouard Musabanganji
spellingShingle Charles K. Mutai
Patrick E. McSharry
Innocent Ngaruye
Edouard Musabanganji
Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa
BMC Medical Research Methodology
Socio-behavioral
Screening
High-risk
Predictors
XGBoost
author_facet Charles K. Mutai
Patrick E. McSharry
Innocent Ngaruye
Edouard Musabanganji
author_sort Charles K. Mutai
title Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa
title_short Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa
title_full Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa
title_fullStr Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa
title_full_unstemmed Use of machine learning techniques to identify HIV predictors for screening in sub-Saharan Africa
title_sort use of machine learning techniques to identify hiv predictors for screening in sub-saharan africa
publisher BMC
series BMC Medical Research Methodology
issn 1471-2288
publishDate 2021-07-01
description Abstract Aim HIV prevention measures in sub-Saharan Africa are still short of attaining the UNAIDS 90–90-90 fast track targets set in 2014. Identifying predictors for HIV status may facilitate targeted screening interventions that improve health care. We aimed at identifying HIV predictors as well as predicting persons at high risk of the infection. Method We applied machine learning approaches for building models using population-based HIV Impact Assessment (PHIA) data for 41,939 male and 45,105 female respondents with 30 and 40 variables respectively from four countries in sub-Saharan countries. We trained and validated the algorithms on 80% of the data and tested on the remaining 20% where we rotated around the left-out country. An algorithm with the best mean f1 score was retained and trained on the most predictive variables. We used the model to identify people living with HIV and individuals with a higher likelihood of contracting the disease. Results Application of XGBoost algorithm appeared to significantly improve identification of HIV positivity over the other five algorithms by f1 scoring mean of 90% and 92% for males and females respectively. Amongst the eight most predictor features in both sexes were: age, relationship with family head, the highest level of education, highest grade at that school level, work for payment, avoiding pregnancy, age at the first experience of sex, and wealth quintile. Model performance using these variables increased significantly compared to having all the variables included. We identified five males and 19 females individuals that would require testing to find one HIV positive individual. We also predicted that 4·14% of males and 10.81% of females are at high risk of infection. Conclusion Our findings provide a potential use of the XGBoost algorithm with socio-behavioural-driven data at substantially identifying HIV predictors and predicting individuals at high risk of infection for targeted screening.
topic Socio-behavioral
Screening
High-risk
Predictors
XGBoost
url https://doi.org/10.1186/s12874-021-01346-2
work_keys_str_mv AT charleskmutai useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica
AT patrickemcsharry useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica
AT innocentngaruye useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica
AT edouardmusabanganji useofmachinelearningtechniquestoidentifyhivpredictorsforscreeninginsubsaharanafrica
_version_ 1721245559365304320