Using electronic health records to develop and validate a machine-learning tool to predict type 2 diabetes outcomes: a study protocol

Introduction Type 2 diabetes mellitus (T2DM) is a major cause of blindness, kidney failure, myocardial infarction, stroke and lower limb amputation. We are still unable, however, to accurately predict or identify which patients are at a higher risk of deterioration. Most risk stratification tools do...

Full description

Bibliographic Details
Main Authors: Ara Darzi, Ben Glampson, Abdulrahim Mulla, Ana Luísa Neves, Tony Willis, Erik Mayer, Pedro Pereira Rodrigues
Format: Article
Language:English
Published: BMJ Publishing Group 2021-07-01
Series:BMJ Open
Online Access:https://bmjopen.bmj.com/content/11/7/e046716.full
id doaj-05f05796722b49fa851e5b57c0abe57f
record_format Article
spelling doaj-05f05796722b49fa851e5b57c0abe57f2021-08-07T16:34:25ZengBMJ Publishing GroupBMJ Open2044-60552021-07-0111710.1136/bmjopen-2020-046716Using electronic health records to develop and validate a machine-learning tool to predict type 2 diabetes outcomes: a study protocolAra Darzi0Ben Glampson1Abdulrahim Mulla2Ana Luísa Neves3Tony Willis4Erik Mayer5Pedro Pereira Rodrigues6NIHR Imperial Patient Safety Translational Research Centre, Imperial College London, London, UKImperial College Healthcare NHS Trust, London, UKImperial College Healthcare NHS Trust, London, UKNIHR Imperial Patient Safety Translational Research Centre, Imperial College London, London, UKNorth West London Diabetes Transformation Programme, North West London Health and Care Partnership, London, UKNIHR Imperial Patient Safety Translational Research Centre, Imperial College London, London, UKCenter for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, PortugalIntroduction Type 2 diabetes mellitus (T2DM) is a major cause of blindness, kidney failure, myocardial infarction, stroke and lower limb amputation. We are still unable, however, to accurately predict or identify which patients are at a higher risk of deterioration. Most risk stratification tools do not account for novel factors such as sociodemographic determinants, self-management ability or access to healthcare. Additionally, most tools are based in clinical trials, with limited external generalisability.Objective The aim of this work is to design and validate a machine learning-based tool to identify patients with T2DM at high risk of clinical deterioration, based on a comprehensive set of patient-level characteristics retrieved from a population health linked dataset.Sample and design Retrospective cohort study of patients with diagnosis of T2DM on 1 January 2015, with a 5-year follow-up. Anonymised electronic healthcare records from the Whole System Integrated Care (WSIC) database will be used.Preliminary outcomes Outcome variables of clinical deterioration will include retinopathy, chronic renal disease, myocardial infarction, stroke, peripheral arterial disease or death. Predictor variables will include sociodemographic and geographic data, patients’ ability to self-manage disease, clinical and metabolic parameters and healthcare service usage. Prognostic models will be defined using multidependence Bayesian networks. The derivation cohort, comprising 80% of the patients, will be used to define the prognostic models. Model parameters will be internally validated by comparing the area under the receiver operating characteristic curve in the derivation cohort with those calculated from a leave-one-out and a 10 times twofold cross-validation.Ethics and dissemination The study has received approvals from the Information Governance Committee at the WSIC. Results will be made available to people with T2DM, their caregivers, the funders, diabetes care societies and other researchers.https://bmjopen.bmj.com/content/11/7/e046716.full
collection DOAJ
language English
format Article
sources DOAJ
author Ara Darzi
Ben Glampson
Abdulrahim Mulla
Ana Luísa Neves
Tony Willis
Erik Mayer
Pedro Pereira Rodrigues
spellingShingle Ara Darzi
Ben Glampson
Abdulrahim Mulla
Ana Luísa Neves
Tony Willis
Erik Mayer
Pedro Pereira Rodrigues
Using electronic health records to develop and validate a machine-learning tool to predict type 2 diabetes outcomes: a study protocol
BMJ Open
author_facet Ara Darzi
Ben Glampson
Abdulrahim Mulla
Ana Luísa Neves
Tony Willis
Erik Mayer
Pedro Pereira Rodrigues
author_sort Ara Darzi
title Using electronic health records to develop and validate a machine-learning tool to predict type 2 diabetes outcomes: a study protocol
title_short Using electronic health records to develop and validate a machine-learning tool to predict type 2 diabetes outcomes: a study protocol
title_full Using electronic health records to develop and validate a machine-learning tool to predict type 2 diabetes outcomes: a study protocol
title_fullStr Using electronic health records to develop and validate a machine-learning tool to predict type 2 diabetes outcomes: a study protocol
title_full_unstemmed Using electronic health records to develop and validate a machine-learning tool to predict type 2 diabetes outcomes: a study protocol
title_sort using electronic health records to develop and validate a machine-learning tool to predict type 2 diabetes outcomes: a study protocol
publisher BMJ Publishing Group
series BMJ Open
issn 2044-6055
publishDate 2021-07-01
description Introduction Type 2 diabetes mellitus (T2DM) is a major cause of blindness, kidney failure, myocardial infarction, stroke and lower limb amputation. We are still unable, however, to accurately predict or identify which patients are at a higher risk of deterioration. Most risk stratification tools do not account for novel factors such as sociodemographic determinants, self-management ability or access to healthcare. Additionally, most tools are based in clinical trials, with limited external generalisability.Objective The aim of this work is to design and validate a machine learning-based tool to identify patients with T2DM at high risk of clinical deterioration, based on a comprehensive set of patient-level characteristics retrieved from a population health linked dataset.Sample and design Retrospective cohort study of patients with diagnosis of T2DM on 1 January 2015, with a 5-year follow-up. Anonymised electronic healthcare records from the Whole System Integrated Care (WSIC) database will be used.Preliminary outcomes Outcome variables of clinical deterioration will include retinopathy, chronic renal disease, myocardial infarction, stroke, peripheral arterial disease or death. Predictor variables will include sociodemographic and geographic data, patients’ ability to self-manage disease, clinical and metabolic parameters and healthcare service usage. Prognostic models will be defined using multidependence Bayesian networks. The derivation cohort, comprising 80% of the patients, will be used to define the prognostic models. Model parameters will be internally validated by comparing the area under the receiver operating characteristic curve in the derivation cohort with those calculated from a leave-one-out and a 10 times twofold cross-validation.Ethics and dissemination The study has received approvals from the Information Governance Committee at the WSIC. Results will be made available to people with T2DM, their caregivers, the funders, diabetes care societies and other researchers.
url https://bmjopen.bmj.com/content/11/7/e046716.full
work_keys_str_mv AT aradarzi usingelectronichealthrecordstodevelopandvalidateamachinelearningtooltopredicttype2diabetesoutcomesastudyprotocol
AT benglampson usingelectronichealthrecordstodevelopandvalidateamachinelearningtooltopredicttype2diabetesoutcomesastudyprotocol
AT abdulrahimmulla usingelectronichealthrecordstodevelopandvalidateamachinelearningtooltopredicttype2diabetesoutcomesastudyprotocol
AT analuisaneves usingelectronichealthrecordstodevelopandvalidateamachinelearningtooltopredicttype2diabetesoutcomesastudyprotocol
AT tonywillis usingelectronichealthrecordstodevelopandvalidateamachinelearningtooltopredicttype2diabetesoutcomesastudyprotocol
AT erikmayer usingelectronichealthrecordstodevelopandvalidateamachinelearningtooltopredicttype2diabetesoutcomesastudyprotocol
AT pedropereirarodrigues usingelectronichealthrecordstodevelopandvalidateamachinelearningtooltopredicttype2diabetesoutcomesastudyprotocol
_version_ 1721216911928197120