The application of unsupervised deep learning in predictive models using electronic health records

Abstract Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses...

Full description

Bibliographic Details
Main Authors:	Lei Wang, Liping Tong, Darcy Davis, Tim Arnold, Tina Esposito
Format:	Article
Language:	English
Published:	BMC 2020-02-01
Series:	BMC Medical Research Methodology
Subjects:	Autoencoder LASSO Enhanced Reg Predictive model Predictive performance Important response-specific predictors
Online Access:	http://link.springer.com/article/10.1186/s12874-020-00923-1

id	doaj-0aaf3b32a04f42c9aeb58b9b7ae985a8
record_format	Article
spelling	doaj-0aaf3b32a04f42c9aeb58b9b7ae985a82020-11-25T02:08:41ZengBMCBMC Medical Research Methodology1471-22882020-02-012011910.1186/s12874-020-00923-1The application of unsupervised deep learning in predictive models using electronic health recordsLei Wang0Liping Tong1Darcy Davis2Tim Arnold3Tina Esposito4School of Statistics, Renmin University of ChinaAdvocate Aurora HealthAdvocate Aurora HealthCerner CorporationAdvocate Aurora HealthAbstract Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. Methods We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals. Results On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper. Conclusions We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.http://link.springer.com/article/10.1186/s12874-020-00923-1AutoencoderLASSOEnhanced RegPredictive modelPredictive performanceImportant response-specific predictors
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Lei Wang Liping Tong Darcy Davis Tim Arnold Tina Esposito
spellingShingle	Lei Wang Liping Tong Darcy Davis Tim Arnold Tina Esposito The application of unsupervised deep learning in predictive models using electronic health records BMC Medical Research Methodology Autoencoder LASSO Enhanced Reg Predictive model Predictive performance Important response-specific predictors
author_facet	Lei Wang Liping Tong Darcy Davis Tim Arnold Tina Esposito
author_sort	Lei Wang
title	The application of unsupervised deep learning in predictive models using electronic health records
title_short	The application of unsupervised deep learning in predictive models using electronic health records
title_full	The application of unsupervised deep learning in predictive models using electronic health records
title_fullStr	The application of unsupervised deep learning in predictive models using electronic health records
title_full_unstemmed	The application of unsupervised deep learning in predictive models using electronic health records
title_sort	application of unsupervised deep learning in predictive models using electronic health records
publisher	BMC
series	BMC Medical Research Methodology
issn	1471-2288
publishDate	2020-02-01
description	Abstract Background The main goal of this study is to explore the use of features representing patient-level electronic health record (EHR) data, generated by the unsupervised deep learning algorithm autoencoder, in predictive modeling. Since autoencoder features are unsupervised, this paper focuses on their general lower-dimensional representation of EHR information in a wide variety of predictive tasks. Methods We compare the model with autoencoder features to traditional models: logistic model with least absolute shrinkage and selection operator (LASSO) and Random Forest algorithm. In addition, we include a predictive model using a small subset of response-specific variables (Simple Reg) and a model combining these variables with features from autoencoder (Enhanced Reg). We performed the study first on simulated data that mimics real world EHR data and then on actual EHR data from eight Advocate hospitals. Results On simulated data with incorrect categories and missing data, the precision for autoencoder is 24.16% when fixing recall at 0.7, which is higher than Random Forest (23.61%) and lower than LASSO (25.32%). The precision is 20.92% in Simple Reg and improves to 24.89% in Enhanced Reg. When using real EHR data to predict the 30-day readmission rate, the precision of autoencoder is 19.04%, which again is higher than Random Forest (18.48%) and lower than LASSO (19.70%). The precisions for Simple Reg and Enhanced Reg are 18.70 and 19.69% respectively. That is, Enhanced Reg can have competitive prediction performance compared to LASSO. In addition, results show that Enhanced Reg usually relies on fewer features under the setting of simulations of this paper. Conclusions We conclude that autoencoder can create useful features representing the entire space of EHR data and which are applicable to a wide array of predictive tasks. Together with important response-specific predictors, we can derive efficient and robust predictive models with less labor in data extraction and model training.
topic	Autoencoder LASSO Enhanced Reg Predictive model Predictive performance Important response-specific predictors
url	http://link.springer.com/article/10.1186/s12874-020-00923-1
work_keys_str_mv	AT leiwang theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT lipingtong theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT darcydavis theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT timarnold theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT tinaesposito theapplicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT leiwang applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT lipingtong applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT darcydavis applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT timarnold applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords AT tinaesposito applicationofunsuperviseddeeplearninginpredictivemodelsusingelectronichealthrecords
_version_	1724926006995714048

The application of unsupervised deep learning in predictive models using electronic health records

Similar Items