Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care

Abstract Familial hypercholesterolaemia (FH) is a common inherited disorder, causing lifelong elevated low-density lipoprotein cholesterol (LDL-C). Most individuals with FH remain undiagnosed, precluding opportunities to prevent premature heart disease and death. Some machine-learning approaches imp...

Full description

Bibliographic Details
Main Authors: Ralph K. Akyea, Nadeem Qureshi, Joe Kai, Stephen F. Weng
Format: Article
Language:English
Published: Nature Publishing Group 2020-10-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-020-00349-5
id doaj-1a6aee63d7fb4492b12b4d6a4b286e1b
record_format Article
spelling doaj-1a6aee63d7fb4492b12b4d6a4b286e1b2021-02-23T09:44:12ZengNature Publishing Groupnpj Digital Medicine2398-63522020-10-01311910.1038/s41746-020-00349-5Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary careRalph K. Akyea0Nadeem Qureshi1Joe Kai2Stephen F. Weng3Primary Care Stratified Medicine, Division of Primary Care, University of NottinghamPrimary Care Stratified Medicine, Division of Primary Care, University of NottinghamPrimary Care Stratified Medicine, Division of Primary Care, University of NottinghamPrimary Care Stratified Medicine, Division of Primary Care, University of NottinghamAbstract Familial hypercholesterolaemia (FH) is a common inherited disorder, causing lifelong elevated low-density lipoprotein cholesterol (LDL-C). Most individuals with FH remain undiagnosed, precluding opportunities to prevent premature heart disease and death. Some machine-learning approaches improve detection of FH in electronic health records, though clinical impact is under-explored. We assessed performance of an array of machine-learning approaches for enhancing detection of FH, and their clinical utility, within a large primary care population. A retrospective cohort study was done using routine primary care clinical records of 4,027,775 individuals from the United Kingdom with total cholesterol measured from 1 January 1999 to 25 June 2019. Predictive accuracy of five common machine-learning algorithms (logistic regression, random forest, gradient boosting machines, neural networks and ensemble learning) were assessed for detecting FH. Predictive accuracy was assessed by area under the receiver operating curves (AUC) and expected vs observed calibration slope; with clinical utility assessed by expected case-review workload and likelihood ratios. There were 7928 incident diagnoses of FH. In addition to known clinical features of FH (raised total cholesterol or LDL-C and family history of premature coronary heart disease), machine-learning (ML) algorithms identified features such as raised triglycerides which reduced the likelihood of FH. Apart from logistic regression (AUC, 0.81), all four other ML approaches had similarly high predictive accuracy (AUC > 0.89). Calibration slope ranged from 0.997 for gradient boosting machines to 1.857 for logistic regression. Among those screened, high probability cases requiring clinical review varied from 0.73% using ensemble learning to 10.16% using deep learning, but with positive predictive values of 15.5% and 2.8% respectively. Ensemble learning exhibited a dominant positive likelihood ratio (45.5) compared to all other ML models (7.0–14.4). Machine-learning models show similar high accuracy in detecting FH, offering opportunities to increase diagnosis. However, the clinical case-finding workload required for yield of cases will differ substantially between models.https://doi.org/10.1038/s41746-020-00349-5
collection DOAJ
language English
format Article
sources DOAJ
author Ralph K. Akyea
Nadeem Qureshi
Joe Kai
Stephen F. Weng
spellingShingle Ralph K. Akyea
Nadeem Qureshi
Joe Kai
Stephen F. Weng
Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care
npj Digital Medicine
author_facet Ralph K. Akyea
Nadeem Qureshi
Joe Kai
Stephen F. Weng
author_sort Ralph K. Akyea
title Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care
title_short Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care
title_full Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care
title_fullStr Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care
title_full_unstemmed Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care
title_sort performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care
publisher Nature Publishing Group
series npj Digital Medicine
issn 2398-6352
publishDate 2020-10-01
description Abstract Familial hypercholesterolaemia (FH) is a common inherited disorder, causing lifelong elevated low-density lipoprotein cholesterol (LDL-C). Most individuals with FH remain undiagnosed, precluding opportunities to prevent premature heart disease and death. Some machine-learning approaches improve detection of FH in electronic health records, though clinical impact is under-explored. We assessed performance of an array of machine-learning approaches for enhancing detection of FH, and their clinical utility, within a large primary care population. A retrospective cohort study was done using routine primary care clinical records of 4,027,775 individuals from the United Kingdom with total cholesterol measured from 1 January 1999 to 25 June 2019. Predictive accuracy of five common machine-learning algorithms (logistic regression, random forest, gradient boosting machines, neural networks and ensemble learning) were assessed for detecting FH. Predictive accuracy was assessed by area under the receiver operating curves (AUC) and expected vs observed calibration slope; with clinical utility assessed by expected case-review workload and likelihood ratios. There were 7928 incident diagnoses of FH. In addition to known clinical features of FH (raised total cholesterol or LDL-C and family history of premature coronary heart disease), machine-learning (ML) algorithms identified features such as raised triglycerides which reduced the likelihood of FH. Apart from logistic regression (AUC, 0.81), all four other ML approaches had similarly high predictive accuracy (AUC > 0.89). Calibration slope ranged from 0.997 for gradient boosting machines to 1.857 for logistic regression. Among those screened, high probability cases requiring clinical review varied from 0.73% using ensemble learning to 10.16% using deep learning, but with positive predictive values of 15.5% and 2.8% respectively. Ensemble learning exhibited a dominant positive likelihood ratio (45.5) compared to all other ML models (7.0–14.4). Machine-learning models show similar high accuracy in detecting FH, offering opportunities to increase diagnosis. However, the clinical case-finding workload required for yield of cases will differ substantially between models.
url https://doi.org/10.1038/s41746-020-00349-5
work_keys_str_mv AT ralphkakyea performanceandclinicalutilityofsupervisedmachinelearningapproachesindetectingfamilialhypercholesterolaemiainprimarycare
AT nadeemqureshi performanceandclinicalutilityofsupervisedmachinelearningapproachesindetectingfamilialhypercholesterolaemiainprimarycare
AT joekai performanceandclinicalutilityofsupervisedmachinelearningapproachesindetectingfamilialhypercholesterolaemiainprimarycare
AT stephenfweng performanceandclinicalutilityofsupervisedmachinelearningapproachesindetectingfamilialhypercholesterolaemiainprimarycare
_version_ 1724254720016515072