The application of unsupervised deep learning in predictive models using electronic health records
Abstract Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-02-01
|
Series: | BMC Medical Research Methodology |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12874-020-00923-1 |
id |
doaj-0aaf3b32a04f42c9aeb58b9b7ae985a8 |
---|---|
record_format |
Article |
spelling |
doaj-0aaf3b32a04f42c9aeb58b9b7ae985a82020-11-25T02:08:41ZengBMCBMC Medical Research Methodology1471-22882020-02-012011910.1186/s12874-020-00923-1The application of unsupervised deep learning in predictive models using electronic health recordsLei Wang0Liping Tong1Darcy Davis2Tim Arnold3Tina Esposito4School of Statistics, Renmin University of ChinaAdvocate Aurora HealthAdvocate Aurora HealthCerner CorporationAdvocate Aurora HealthAbstract Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. Methods We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals. Results On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper. Conclusions We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.http://link.springer.com/article/10.1186/s12874-020-00923-1AutoencoderLASSOEnhanced RegPredictive modelPredictive performanceImportant response-specific predictors |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Lei Wang Liping Tong Darcy Davis Tim Arnold Tina Esposito |
spellingShingle |
Lei Wang Liping Tong Darcy Davis Tim Arnold Tina Esposito The application of unsupervised deep learning in predictive models using electronic health records BMC Medical Research Methodology Autoencoder LASSO Enhanced Reg Predictive model Predictive performance Important response-specific predictors |
author_facet |
Lei Wang Liping Tong Darcy Davis Tim Arnold Tina Esposito |
author_sort |
Lei Wang |
title |
The application of unsupervised deep learning in predictive models using electronic health records |
title_short |
The application of unsupervised deep learning in predictive models using electronic health records |
title_full |
The application of unsupervised deep learning in predictive models using electronic health records |
title_fullStr |
The application of unsupervised deep learning in predictive models using electronic health records |
title_full_unstemmed |
The application of unsupervised deep learning in predictive models using electronic health records |
title_sort |
application of unsupervised deep learning in predictive models using electronic health records |
publisher |
BMC |
series |
BMC Medical Research Methodology |
issn |
1471-2288 |
publishDate |
2020-02-01 |
description |
Abstract Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. Methods We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals. Results On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper. Conclusions We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training. |
topic |
Autoencoder LASSO Enhanced Reg Predictive model Predictive performance Important response-specific predictors |
url |
http://link.springer.com/article/10.1186/s12874-020-00923-1 |
work_keys_str_mv |
AT leiwang theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT lipingtong theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT darcydavis theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT timarnold theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT tinaesposito theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT leiwang applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT lipingtong applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT darcydavis applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT timarnold applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT tinaesposito applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords |
_version_ |
1724926006995714048 |