INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis

Abstract Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods cu...

Full description

Bibliographic Details
Main Authors: Hooman Zabeti, Nick Dexter, Amir Hosein Safari, Nafiseh Sedaghat, Maxwell Libbrecht, Leonid Chindelevitch
Format: Article
Language:English
Published: BMC 2021-08-01
Series:Algorithms for Molecular Biology
Subjects:
Online Access:https://doi.org/10.1186/s13015-021-00198-1
id doaj-c2c4edd3d7db414fa088a1c8e32075ae
record_format Article
spelling doaj-c2c4edd3d7db414fa088a1c8e32075ae2021-08-15T11:03:08ZengBMCAlgorithms for Molecular Biology1748-71882021-08-0116111210.1186/s13015-021-00198-1INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosisHooman Zabeti0Nick Dexter1Amir Hosein Safari2Nafiseh Sedaghat3Maxwell Libbrecht4Leonid Chindelevitch5School of Computing Science, Simon Fraser UniversityDepartment of Mathematics, Simon Fraser UniversitySchool of Computing Science, Simon Fraser UniversitySchool of Computing Science, Simon Fraser UniversitySchool of Computing Science, Simon Fraser UniversityDepartment of Infectious Disease Epidemiology, Imperial CollegeAbstract Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.https://doi.org/10.1186/s13015-021-00198-1Drug resistanceInterpretable machine learningGroup testingInteger linear programmingRule-based learningWhole-genome sequencing
collection DOAJ
language English
format Article
sources DOAJ
author Hooman Zabeti
Nick Dexter
Amir Hosein Safari
Nafiseh Sedaghat
Maxwell Libbrecht
Leonid Chindelevitch
spellingShingle Hooman Zabeti
Nick Dexter
Amir Hosein Safari
Nafiseh Sedaghat
Maxwell Libbrecht
Leonid Chindelevitch
INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis
Algorithms for Molecular Biology
Drug resistance
Interpretable machine learning
Group testing
Integer linear programming
Rule-based learning
Whole-genome sequencing
author_facet Hooman Zabeti
Nick Dexter
Amir Hosein Safari
Nafiseh Sedaghat
Maxwell Libbrecht
Leonid Chindelevitch
author_sort Hooman Zabeti
title INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis
title_short INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis
title_full INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis
title_fullStr INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis
title_full_unstemmed INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis
title_sort ingot-dr: an interpretable classifier for predicting drug resistance in m. tuberculosis
publisher BMC
series Algorithms for Molecular Biology
issn 1748-7188
publishDate 2021-08-01
description Abstract Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.
topic Drug resistance
Interpretable machine learning
Group testing
Integer linear programming
Rule-based learning
Whole-genome sequencing
url https://doi.org/10.1186/s13015-021-00198-1
work_keys_str_mv AT hoomanzabeti ingotdraninterpretableclassifierforpredictingdrugresistanceinmtuberculosis
AT nickdexter ingotdraninterpretableclassifierforpredictingdrugresistanceinmtuberculosis
AT amirhoseinsafari ingotdraninterpretableclassifierforpredictingdrugresistanceinmtuberculosis
AT nafisehsedaghat ingotdraninterpretableclassifierforpredictingdrugresistanceinmtuberculosis
AT maxwelllibbrecht ingotdraninterpretableclassifierforpredictingdrugresistanceinmtuberculosis
AT leonidchindelevitch ingotdraninterpretableclassifierforpredictingdrugresistanceinmtuberculosis
_version_ 1721207232236879872