The application of unsupervised deep learning in predictive models using electronic health records

Abstract Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses...

Full description

Bibliographic Details
Main Authors: Lei Wang, Liping Tong, Darcy Davis, Tim Arnold, Tina Esposito
Format: Article
Language:English
Published: BMC 2020-02-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12874-020-00923-1
id doaj-0aaf3b32a04f42c9aeb58b9b7ae985a8
record_format Article
spelling doaj-0aaf3b32a04f42c9aeb58b9b7ae985a82020-11-25T02:08:41ZengBMCBMC Medical Research Methodology1471-22882020-02-012011910.1186/s12874-020-00923-1The application of unsupervised deep learning in predictive models using electronic health recordsLei Wang0Liping Tong1Darcy Davis2Tim Arnold3Tina Esposito4School of Statistics, Renmin University of ChinaAdvocate Aurora HealthAdvocate Aurora HealthCerner CorporationAdvocate Aurora HealthAbstract Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. Methods We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals. Results On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper. Conclusions We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.http://link.springer.com/article/10.1186/s12874-020-00923-1AutoencoderLASSOEnhanced RegPredictive modelPredictive performanceImportant response-specific predictors
collection DOAJ
language English
format Article
sources DOAJ
author Lei Wang
Liping Tong
Darcy Davis
Tim Arnold
Tina Esposito
spellingShingle Lei Wang
Liping Tong
Darcy Davis
Tim Arnold
Tina Esposito
The application of unsupervised deep learning in predictive models using electronic health records
BMC Medical Research Methodology
Autoencoder
LASSO
Enhanced Reg
Predictive model
Predictive performance
Important response-specific predictors
author_facet Lei Wang
Liping Tong
Darcy Davis
Tim Arnold
Tina Esposito
author_sort Lei Wang
title The application of unsupervised deep learning in predictive models using electronic health records
title_short The application of unsupervised deep learning in predictive models using electronic health records
title_full The application of unsupervised deep learning in predictive models using electronic health records
title_fullStr The application of unsupervised deep learning in predictive models using electronic health records
title_full_unstemmed The application of unsupervised deep learning in predictive models using electronic health records
title_sort application of unsupervised deep learning in predictive models using electronic health records
publisher BMC
series BMC Medical Research Methodology
issn 1471-2288
publishDate 2020-02-01
description Abstract Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. Methods We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals. Results On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper. Conclusions We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.
topic Autoencoder
LASSO
Enhanced Reg
Predictive model
Predictive performance
Important response-specific predictors
url http://link.springer.com/article/10.1186/s12874-020-00923-1
work_keys_str_mv AT leiwang theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
AT lipingtong theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
AT darcydavis theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
AT timarnold theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
AT tinaesposito theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
AT leiwang applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
AT lipingtong applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
AT darcydavis applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
AT timarnold applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
AT tinaesposito applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
_version_ 1724926006995714048