Leveraging graph-based hierarchical medical entity embedding for healthcare applications
Abstract Automatic representation learning of key entities in electronic health record (EHR) data is a critical step for healthcare data mining that turns heterogeneous medical records into structured and actionable information. Here we propose ME2Vec, an algorithmic framework for learning continuou...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Publishing Group
2021-03-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-021-85255-w |
id |
doaj-e0c414c7ba28497f8db9cf05f2a9f717 |
---|---|
record_format |
Article |
spelling |
doaj-e0c414c7ba28497f8db9cf05f2a9f7172021-03-14T12:16:43ZengNature Publishing GroupScientific Reports2045-23222021-03-0111111310.1038/s41598-021-85255-wLeveraging graph-based hierarchical medical entity embedding for healthcare applicationsTong Wu0Yunlong Wang1Yue Wang2Emily Zhao3Yilian Yuan4Advanced Analytics, IQVIA Inc.Advanced Analytics, IQVIA Inc.Advanced Analytics, IQVIA Inc.Advanced Analytics, IQVIA Inc.Advanced Analytics, IQVIA Inc.Abstract Automatic representation learning of key entities in electronic health record (EHR) data is a critical step for healthcare data mining that turns heterogeneous medical records into structured and actionable information. Here we propose ME2Vec, an algorithmic framework for learning continuous low-dimensional embedding vectors of the most common entities in EHR: medical services, doctors, and patients. ME2Vec features a hierarchical structure that encapsulates different node embedding schemes to cater for the unique characteristic of each medical entity. To embed medical services, we employ a biased-random-walk-based node embedding that leverages the irregular time intervals of medical services in EHR to embody their relative importance. To embed doctors and patients, we adhere to the principle “it’s what you do that defines you” and derive their embeddings based on their interactions with other types of entities through graph neural network and proximity-preserving network embedding, respectively. Using real-world clinical data, we demonstrate the efficacy of ME2Vec over competitive baselines on diagnosis prediction, readmission prediction, as well as recommending doctors to patients based on their medical conditions. In addition, medical service embeddings pretrained using ME2Vec can substantially improve the performance of sequential models in predicting patients clinical outcomes. Overall, ME2Vec can serve as a general-purpose representation learning algorithm for EHR data and benefit various downstream tasks in terms of both performance and interpretability.https://doi.org/10.1038/s41598-021-85255-w |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Tong Wu Yunlong Wang Yue Wang Emily Zhao Yilian Yuan |
spellingShingle |
Tong Wu Yunlong Wang Yue Wang Emily Zhao Yilian Yuan Leveraging graph-based hierarchical medical entity embedding for healthcare applications Scientific Reports |
author_facet |
Tong Wu Yunlong Wang Yue Wang Emily Zhao Yilian Yuan |
author_sort |
Tong Wu |
title |
Leveraging graph-based hierarchical medical entity embedding for healthcare applications |
title_short |
Leveraging graph-based hierarchical medical entity embedding for healthcare applications |
title_full |
Leveraging graph-based hierarchical medical entity embedding for healthcare applications |
title_fullStr |
Leveraging graph-based hierarchical medical entity embedding for healthcare applications |
title_full_unstemmed |
Leveraging graph-based hierarchical medical entity embedding for healthcare applications |
title_sort |
leveraging graph-based hierarchical medical entity embedding for healthcare applications |
publisher |
Nature Publishing Group |
series |
Scientific Reports |
issn |
2045-2322 |
publishDate |
2021-03-01 |
description |
Abstract Automatic representation learning of key entities in electronic health record (EHR) data is a critical step for healthcare data mining that turns heterogeneous medical records into structured and actionable information. Here we propose ME2Vec, an algorithmic framework for learning continuous low-dimensional embedding vectors of the most common entities in EHR: medical services, doctors, and patients. ME2Vec features a hierarchical structure that encapsulates different node embedding schemes to cater for the unique characteristic of each medical entity. To embed medical services, we employ a biased-random-walk-based node embedding that leverages the irregular time intervals of medical services in EHR to embody their relative importance. To embed doctors and patients, we adhere to the principle “it’s what you do that defines you” and derive their embeddings based on their interactions with other types of entities through graph neural network and proximity-preserving network embedding, respectively. Using real-world clinical data, we demonstrate the efficacy of ME2Vec over competitive baselines on diagnosis prediction, readmission prediction, as well as recommending doctors to patients based on their medical conditions. In addition, medical service embeddings pretrained using ME2Vec can substantially improve the performance of sequential models in predicting patients clinical outcomes. Overall, ME2Vec can serve as a general-purpose representation learning algorithm for EHR data and benefit various downstream tasks in terms of both performance and interpretability. |
url |
https://doi.org/10.1038/s41598-021-85255-w |
work_keys_str_mv |
AT tongwu leveraginggraphbasedhierarchicalmedicalentityembeddingforhealthcareapplications AT yunlongwang leveraginggraphbasedhierarchicalmedicalentityembeddingforhealthcareapplications AT yuewang leveraginggraphbasedhierarchicalmedicalentityembeddingforhealthcareapplications AT emilyzhao leveraginggraphbasedhierarchicalmedicalentityembeddingforhealthcareapplications AT yilianyuan leveraginggraphbasedhierarchicalmedicalentityembeddingforhealthcareapplications |
_version_ |
1724221487594864640 |