GenTB: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learning

Abstract Background Multidrug-resistant Mycobacterium tuberculosis (Mtb) is a significant global public health threat. Genotypic resistance prediction from Mtb DNA sequences offers an alternative to laboratory-based drug-susceptibility testing. User-friendly and accurate resistance prediction tools...

Full description

Bibliographic Details
Main Authors: Matthias I. Gröschel, Martin Owens, Luca Freschi, Roger Vargas, Maximilian G. Marin, Jody Phelan, Zamin Iqbal, Avika Dixit, Maha R. Farhat
Format: Article
Language:English
Published: BMC 2021-08-01
Series:Genome Medicine
Subjects:
Online Access:https://doi.org/10.1186/s13073-021-00953-4
id doaj-f8dcf934241146c6bb0d1c03b6fcf2c7
record_format Article
spelling doaj-f8dcf934241146c6bb0d1c03b6fcf2c72021-09-05T11:36:48ZengBMCGenome Medicine1756-994X2021-08-0113111410.1186/s13073-021-00953-4GenTB: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learningMatthias I. Gröschel0Martin Owens1Luca Freschi2Roger Vargas3Maximilian G. Marin4Jody Phelan5Zamin Iqbal6Avika Dixit7Maha R. Farhat8Department of Biomedical Informatics, Harvard Medical SchoolDepartment of Biomedical Informatics, Harvard Medical SchoolDepartment of Biomedical Informatics, Harvard Medical SchoolDepartment of Biomedical Informatics, Harvard Medical SchoolDepartment of Biomedical Informatics, Harvard Medical SchoolFaculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical MedicineEuropean Bioinformatics InstituteDepartment of Biomedical Informatics, Harvard Medical SchoolDepartment of Biomedical Informatics, Harvard Medical SchoolAbstract Background Multidrug-resistant Mycobacterium tuberculosis (Mtb) is a significant global public health threat. Genotypic resistance prediction from Mtb DNA sequences offers an alternative to laboratory-based drug-susceptibility testing. User-friendly and accurate resistance prediction tools are needed to enable public health and clinical practitioners to rapidly diagnose resistance and inform treatment regimens. Results We present Translational Genomics platform for Tuberculosis (GenTB), a free and open web-based application to predict antibiotic resistance from next-generation sequence data. The user can choose between two potential predictors, a Random Forest (RF) classifier and a Wide and Deep Neural Network (WDNN) to predict phenotypic resistance to 13 and 10 anti-tuberculosis drugs, respectively. We benchmark GenTB’s predictive performance along with leading TB resistance prediction tools (Mykrobe and TB-Profiler) using a ground truth dataset of 20,408 isolates with laboratory-based drug susceptibility data. All four tools reliably predicted resistance to first-line tuberculosis drugs but had varying performance for second-line drugs. The mean sensitivities for GenTB-RF and GenTB-WDNN across the nine shared drugs were 77.6% (95% CI 76.6–78.5%) and 75.4% (95% CI 74.5–76.4%), respectively, and marginally higher than the sensitivities of TB-Profiler at 74.4% (95% CI 73.4–75.3%) and Mykrobe at 71.9% (95% CI 70.9–72.9%). The higher sensitivities were at an expense of ≤ 1.5% lower specificity: Mykrobe 97.6% (95% CI 97.5–97.7%), TB-Profiler 96.9% (95% CI 96.7 to 97.0%), GenTB-WDNN 96.2% (95% CI 96.0 to 96.4%), and GenTB-RF 96.1% (95% CI 96.0 to 96.3%). Averaged across the four tools, genotypic resistance sensitivity was 11% and 9% lower for isoniazid and rifampicin respectively, on isolates sequenced at low depth (< 10× across 95% of the genome) emphasizing the need to quality control input sequence data before prediction. We discuss differences between tools in reporting results to the user including variants underlying the resistance calls and any novel or indeterminate variants Conclusions GenTB is an easy-to-use online tool to rapidly and accurately predict resistance to anti-tuberculosis drugs. GenTB can be accessed online at https://gentb.hms.harvard.edu , and the source code is available at https://github.com/farhat-lab/gentb-site .https://doi.org/10.1186/s13073-021-00953-4TuberculosisDrug resistanceDrug-susceptibility testingDiagnosticsWhole genome sequencingMachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Matthias I. Gröschel
Martin Owens
Luca Freschi
Roger Vargas
Maximilian G. Marin
Jody Phelan
Zamin Iqbal
Avika Dixit
Maha R. Farhat
spellingShingle Matthias I. Gröschel
Martin Owens
Luca Freschi
Roger Vargas
Maximilian G. Marin
Jody Phelan
Zamin Iqbal
Avika Dixit
Maha R. Farhat
GenTB: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learning
Genome Medicine
Tuberculosis
Drug resistance
Drug-susceptibility testing
Diagnostics
Whole genome sequencing
Machine learning
author_facet Matthias I. Gröschel
Martin Owens
Luca Freschi
Roger Vargas
Maximilian G. Marin
Jody Phelan
Zamin Iqbal
Avika Dixit
Maha R. Farhat
author_sort Matthias I. Gröschel
title GenTB: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learning
title_short GenTB: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learning
title_full GenTB: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learning
title_fullStr GenTB: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learning
title_full_unstemmed GenTB: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learning
title_sort gentb: a user-friendly genome-based predictor for tuberculosis resistance powered by machine learning
publisher BMC
series Genome Medicine
issn 1756-994X
publishDate 2021-08-01
description Abstract Background Multidrug-resistant Mycobacterium tuberculosis (Mtb) is a significant global public health threat. Genotypic resistance prediction from Mtb DNA sequences offers an alternative to laboratory-based drug-susceptibility testing. User-friendly and accurate resistance prediction tools are needed to enable public health and clinical practitioners to rapidly diagnose resistance and inform treatment regimens. Results We present Translational Genomics platform for Tuberculosis (GenTB), a free and open web-based application to predict antibiotic resistance from next-generation sequence data. The user can choose between two potential predictors, a Random Forest (RF) classifier and a Wide and Deep Neural Network (WDNN) to predict phenotypic resistance to 13 and 10 anti-tuberculosis drugs, respectively. We benchmark GenTB’s predictive performance along with leading TB resistance prediction tools (Mykrobe and TB-Profiler) using a ground truth dataset of 20,408 isolates with laboratory-based drug susceptibility data. All four tools reliably predicted resistance to first-line tuberculosis drugs but had varying performance for second-line drugs. The mean sensitivities for GenTB-RF and GenTB-WDNN across the nine shared drugs were 77.6% (95% CI 76.6–78.5%) and 75.4% (95% CI 74.5–76.4%), respectively, and marginally higher than the sensitivities of TB-Profiler at 74.4% (95% CI 73.4–75.3%) and Mykrobe at 71.9% (95% CI 70.9–72.9%). The higher sensitivities were at an expense of ≤ 1.5% lower specificity: Mykrobe 97.6% (95% CI 97.5–97.7%), TB-Profiler 96.9% (95% CI 96.7 to 97.0%), GenTB-WDNN 96.2% (95% CI 96.0 to 96.4%), and GenTB-RF 96.1% (95% CI 96.0 to 96.3%). Averaged across the four tools, genotypic resistance sensitivity was 11% and 9% lower for isoniazid and rifampicin respectively, on isolates sequenced at low depth (< 10× across 95% of the genome) emphasizing the need to quality control input sequence data before prediction. We discuss differences between tools in reporting results to the user including variants underlying the resistance calls and any novel or indeterminate variants Conclusions GenTB is an easy-to-use online tool to rapidly and accurately predict resistance to anti-tuberculosis drugs. GenTB can be accessed online at https://gentb.hms.harvard.edu , and the source code is available at https://github.com/farhat-lab/gentb-site .
topic Tuberculosis
Drug resistance
Drug-susceptibility testing
Diagnostics
Whole genome sequencing
Machine learning
url https://doi.org/10.1186/s13073-021-00953-4
work_keys_str_mv AT matthiasigroschel gentbauserfriendlygenomebasedpredictorfortuberculosisresistancepoweredbymachinelearning
AT martinowens gentbauserfriendlygenomebasedpredictorfortuberculosisresistancepoweredbymachinelearning
AT lucafreschi gentbauserfriendlygenomebasedpredictorfortuberculosisresistancepoweredbymachinelearning
AT rogervargas gentbauserfriendlygenomebasedpredictorfortuberculosisresistancepoweredbymachinelearning
AT maximiliangmarin gentbauserfriendlygenomebasedpredictorfortuberculosisresistancepoweredbymachinelearning
AT jodyphelan gentbauserfriendlygenomebasedpredictorfortuberculosisresistancepoweredbymachinelearning
AT zaminiqbal gentbauserfriendlygenomebasedpredictorfortuberculosisresistancepoweredbymachinelearning
AT avikadixit gentbauserfriendlygenomebasedpredictorfortuberculosisresistancepoweredbymachinelearning
AT maharfarhat gentbauserfriendlygenomebasedpredictorfortuberculosisresistancepoweredbymachinelearning
_version_ 1717814113910915072