Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach

BackgroundMachine learning models require large datasets that may be siloed across different health care institutions. Machine learning studies that focus on COVID-19 have been limited to single-hospital data, which limits model generalizability. ObjectiveWe aimed...

Full description

Bibliographic Details
Main Authors: Vaid, Akhil, Jaladanki, Suraj K, Xu, Jie, Teng, Shelly, Kumar, Arvind, Lee, Samuel, Somani, Sulaiman, Paranjpe, Ishan, De Freitas, Jessica K, Wanyan, Tingyi, Johnson, Kipp W, Bicak, Mesude, Klang, Eyal, Kwon, Young Joon, Costa, Anthony, Zhao, Shan, Miotto, Riccardo, Charney, Alexander W, Böttinger, Erwin, Fayad, Zahi A, Nadkarni, Girish N, Wang, Fei, Glicksberg, Benjamin S
Format: Article
Language:English
Published: JMIR Publications 2021-01-01
Series:JMIR Medical Informatics
Online Access:http://medinform.jmir.org/2021/1/e24207/
Description
Summary:BackgroundMachine learning models require large datasets that may be siloed across different health care institutions. Machine learning studies that focus on COVID-19 have been limited to single-hospital data, which limits model generalizability. ObjectiveWe aimed to use federated learning, a machine learning technique that avoids locally aggregating raw clinical data across multiple institutions, to predict mortality in hospitalized patients with COVID-19 within 7 days. MethodsPatient data were collected from the electronic health records of 5 hospitals within the Mount Sinai Health System. Logistic regression with L1 regularization/least absolute shrinkage and selection operator (LASSO) and multilayer perceptron (MLP) models were trained by using local data at each site. We developed a pooled model with combined data from all 5 sites, and a federated model that only shared parameters with a central aggregator. ResultsThe LASSOfederated model outperformed the LASSOlocal model at 3 hospitals, and the MLPfederated model performed better than the MLPlocal model at all 5 hospitals, as determined by the area under the receiver operating characteristic curve. The LASSOpooled model outperformed the LASSOfederated model at all hospitals, and the MLPfederated model outperformed the MLPpooled model at 2 hospitals. ConclusionsThe federated learning of COVID-19 electronic health record data shows promise in developing robust predictive models without compromising patient privacy.
ISSN:2291-9694