Development of An Individualized Risk Prediction Model for COVID-19 Using Electronic Health Record Data
Developing an accurate and interpretable model to predict an individual’s risk for Coronavirus Disease 2019 (COVID-19) is a critical step to efficiently triage testing and other scarce preventative resources. To aid in this effort, we have developed an interpretable risk calculator that utilized de-...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-06-01
|
Series: | Frontiers in Big Data |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fdata.2021.675882/full |
id |
doaj-d939237654584a02ba0dcf3529dde537 |
---|---|
record_format |
Article |
spelling |
doaj-d939237654584a02ba0dcf3529dde5372021-06-04T04:57:03ZengFrontiers Media S.A.Frontiers in Big Data2624-909X2021-06-01410.3389/fdata.2021.675882675882Development of An Individualized Risk Prediction Model for COVID-19 Using Electronic Health Record DataTarun Karthik Kumar Mamidi0Thi K. Tran-Nguyen1Ryan L. Melvin2Elizabeth A. Worthey3Elizabeth A. Worthey4Center for Computational Genomics and Data Science, Departments of Pediatrics and Pathology, University of Alabama at Birmingham School of Medicine, Birmingham, AL, United StatesHugh Kaul Precision Medicine Institute, University of Alabama at Birmingham, Birmingham, AL, United StatesDepartment of Anesthesiology and Perioperative Medicine, University of Alabama at Birmingham, Birmingham, AL, United StatesCenter for Computational Genomics and Data Science, Departments of Pediatrics and Pathology, University of Alabama at Birmingham School of Medicine, Birmingham, AL, United StatesHugh Kaul Precision Medicine Institute, University of Alabama at Birmingham, Birmingham, AL, United StatesDeveloping an accurate and interpretable model to predict an individual’s risk for Coronavirus Disease 2019 (COVID-19) is a critical step to efficiently triage testing and other scarce preventative resources. To aid in this effort, we have developed an interpretable risk calculator that utilized de-identified electronic health records (EHR) from the University of Alabama at Birmingham Informatics for Integrating Biology and the Bedside (UAB-i2b2) COVID-19 repository under the U-BRITE framework. The generated risk scores are analogous to commonly used credit scores where higher scores indicate higher risks for COVID-19 infection. By design, these risk scores can easily be calculated in spreadsheets or even with pen and paper. To predict risk, we implemented a Credit Scorecard modeling approach on longitudinal EHR data from 7,262 patients enrolled in the UAB Health System who were evaluated and/or tested for COVID-19 between January and June 2020. In this cohort, 912 patients were positive for COVID-19. Our workflow considered the timing of symptoms and medical conditions and tested the effects by applying different variable selection techniques such as LASSO and Elastic-Net. Within the two weeks before a COVID-19 diagnosis, the most predictive features were respiratory symptoms such as cough, abnormalities of breathing, pain in the throat and chest as well as other chronic conditions including nicotine dependence and major depressive disorder. When extending the timeframe to include all medical conditions across all time, our models also uncovered several chronic conditions impacting the respiratory, cardiovascular, central nervous and urinary organ systems. The whole pipeline of data processing, risk modeling and web-based risk calculator can be applied to any EHR data following the OMOP common data format. The results can be employed to generate questionnaires to estimate COVID-19 risk for screening in building entries or to optimize hospital resources.https://www.frontiersin.org/articles/10.3389/fdata.2021.675882/fullCOVID-19electronic health recordrisk predictionICD-10credit scorecard model |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Tarun Karthik Kumar Mamidi Thi K. Tran-Nguyen Ryan L. Melvin Elizabeth A. Worthey Elizabeth A. Worthey |
spellingShingle |
Tarun Karthik Kumar Mamidi Thi K. Tran-Nguyen Ryan L. Melvin Elizabeth A. Worthey Elizabeth A. Worthey Development of An Individualized Risk Prediction Model for COVID-19 Using Electronic Health Record Data Frontiers in Big Data COVID-19 electronic health record risk prediction ICD-10 credit scorecard model |
author_facet |
Tarun Karthik Kumar Mamidi Thi K. Tran-Nguyen Ryan L. Melvin Elizabeth A. Worthey Elizabeth A. Worthey |
author_sort |
Tarun Karthik Kumar Mamidi |
title |
Development of An Individualized Risk Prediction Model for COVID-19 Using Electronic Health Record Data |
title_short |
Development of An Individualized Risk Prediction Model for COVID-19 Using Electronic Health Record Data |
title_full |
Development of An Individualized Risk Prediction Model for COVID-19 Using Electronic Health Record Data |
title_fullStr |
Development of An Individualized Risk Prediction Model for COVID-19 Using Electronic Health Record Data |
title_full_unstemmed |
Development of An Individualized Risk Prediction Model for COVID-19 Using Electronic Health Record Data |
title_sort |
development of an individualized risk prediction model for covid-19 using electronic health record data |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Big Data |
issn |
2624-909X |
publishDate |
2021-06-01 |
description |
Developing an accurate and interpretable model to predict an individual’s risk for Coronavirus Disease 2019 (COVID-19) is a critical step to efficiently triage testing and other scarce preventative resources. To aid in this effort, we have developed an interpretable risk calculator that utilized de-identified electronic health records (EHR) from the University of Alabama at Birmingham Informatics for Integrating Biology and the Bedside (UAB-i2b2) COVID-19 repository under the U-BRITE framework. The generated risk scores are analogous to commonly used credit scores where higher scores indicate higher risks for COVID-19 infection. By design, these risk scores can easily be calculated in spreadsheets or even with pen and paper. To predict risk, we implemented a Credit Scorecard modeling approach on longitudinal EHR data from 7,262 patients enrolled in the UAB Health System who were evaluated and/or tested for COVID-19 between January and June 2020. In this cohort, 912 patients were positive for COVID-19. Our workflow considered the timing of symptoms and medical conditions and tested the effects by applying different variable selection techniques such as LASSO and Elastic-Net. Within the two weeks before a COVID-19 diagnosis, the most predictive features were respiratory symptoms such as cough, abnormalities of breathing, pain in the throat and chest as well as other chronic conditions including nicotine dependence and major depressive disorder. When extending the timeframe to include all medical conditions across all time, our models also uncovered several chronic conditions impacting the respiratory, cardiovascular, central nervous and urinary organ systems. The whole pipeline of data processing, risk modeling and web-based risk calculator can be applied to any EHR data following the OMOP common data format. The results can be employed to generate questionnaires to estimate COVID-19 risk for screening in building entries or to optimize hospital resources. |
topic |
COVID-19 electronic health record risk prediction ICD-10 credit scorecard model |
url |
https://www.frontiersin.org/articles/10.3389/fdata.2021.675882/full |
work_keys_str_mv |
AT tarunkarthikkumarmamidi developmentofanindividualizedriskpredictionmodelforcovid19usingelectronichealthrecorddata AT thiktrannguyen developmentofanindividualizedriskpredictionmodelforcovid19usingelectronichealthrecorddata AT ryanlmelvin developmentofanindividualizedriskpredictionmodelforcovid19usingelectronichealthrecorddata AT elizabethaworthey developmentofanindividualizedriskpredictionmodelforcovid19usingelectronichealthrecorddata AT elizabethaworthey developmentofanindividualizedriskpredictionmodelforcovid19usingelectronichealthrecorddata |
_version_ |
1721398507496013824 |