Embedding, aligning and reconstructing clinical notes to explore sepsis

Abstract Objective Our goal was to research and develop exploratory analysis tools for clinical notes, which now are underrepresented to limit the diversity of data insights on medically relevant applications. Results We characterize how exploratory analysis can affect representation learning on cli...

Full description

Bibliographic Details
Main Authors: Xudong Zhu, Joseph M. Plasek, Chunlei Tang, Wasim Al-Assad, Zhikun Zhang, Yun Xiong, Liqin Wang, Sharmitha Yerneni, Carlos Ortega, Min-Jeoung Kang, Li Zhou, David W. Bates, Patricia C. Dykes
Format: Article
Language:English
Published: BMC 2021-04-01
Series:BMC Research Notes
Subjects:
Online Access:https://doi.org/10.1186/s13104-021-05529-4
id doaj-d6559c2b62d046deafb10ee2826f67b9
record_format Article
spelling doaj-d6559c2b62d046deafb10ee2826f67b92021-04-18T11:43:08ZengBMCBMC Research Notes1756-05002021-04-011411610.1186/s13104-021-05529-4Embedding, aligning and reconstructing clinical notes to explore sepsisXudong Zhu0Joseph M. Plasek1Chunlei Tang2Wasim Al-Assad3Zhikun Zhang4Yun Xiong5Liqin Wang6Sharmitha Yerneni7Carlos Ortega8Min-Jeoung Kang9Li Zhou10David W. Bates11Patricia C. Dykes12Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan UniversityDivision of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical SchoolDivision of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical SchoolDivision of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical SchoolShanghai Key Laboratory of Data Science, School of Computer Science, Fudan UniversityShanghai Key Laboratory of Data Science, School of Computer Science, Fudan UniversityDivision of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical SchoolDivision of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical SchoolDivision of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical SchoolDivision of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical SchoolDivision of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical SchoolDivision of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical SchoolDivision of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical SchoolAbstract Objective Our goal was to research and develop exploratory analysis tools for clinical notes, which now are underrepresented to limit the diversity of data insights on medically relevant applications. Results We characterize how exploratory analysis can affect representation learning on clinical narratives and present several self-developed tools to explore sepsis. Our experiments focus on patients with sepsis in the MIMIC-III Clinical Database or in our institution’s research patient data repository. We found that global embeddings assist in learning local representations of clinical notes. Second, aligning at any specific time facilitates the use of learning models by pooling more available clinical notes to form a training set. Furthermore, reconstruction of the timeline enhances downstream-processing techniques by emphasizing temporal expressions and temporal relationships in clinical documentation. We demonstrate that clustering helps plot various types of clinical notes against a scale, which conveys a sense of the range or spread of the data and is useful for understanding data correlations. Appropriate exploratory analysis tools provide keen insights into preprocessing clinical notes, thereby further enhancing downstream analysis capabilities, making data driven medicine possible. Our examples can help generate better data representation of clinical documentation for models with improved performance and interpretability.https://doi.org/10.1186/s13104-021-05529-4SepsisRepresentation learningExploratory analysisData driven medicine
collection DOAJ
language English
format Article
sources DOAJ
author Xudong Zhu
Joseph M. Plasek
Chunlei Tang
Wasim Al-Assad
Zhikun Zhang
Yun Xiong
Liqin Wang
Sharmitha Yerneni
Carlos Ortega
Min-Jeoung Kang
Li Zhou
David W. Bates
Patricia C. Dykes
spellingShingle Xudong Zhu
Joseph M. Plasek
Chunlei Tang
Wasim Al-Assad
Zhikun Zhang
Yun Xiong
Liqin Wang
Sharmitha Yerneni
Carlos Ortega
Min-Jeoung Kang
Li Zhou
David W. Bates
Patricia C. Dykes
Embedding, aligning and reconstructing clinical notes to explore sepsis
BMC Research Notes
Sepsis
Representation learning
Exploratory analysis
Data driven medicine
author_facet Xudong Zhu
Joseph M. Plasek
Chunlei Tang
Wasim Al-Assad
Zhikun Zhang
Yun Xiong
Liqin Wang
Sharmitha Yerneni
Carlos Ortega
Min-Jeoung Kang
Li Zhou
David W. Bates
Patricia C. Dykes
author_sort Xudong Zhu
title Embedding, aligning and reconstructing clinical notes to explore sepsis
title_short Embedding, aligning and reconstructing clinical notes to explore sepsis
title_full Embedding, aligning and reconstructing clinical notes to explore sepsis
title_fullStr Embedding, aligning and reconstructing clinical notes to explore sepsis
title_full_unstemmed Embedding, aligning and reconstructing clinical notes to explore sepsis
title_sort embedding, aligning and reconstructing clinical notes to explore sepsis
publisher BMC
series BMC Research Notes
issn 1756-0500
publishDate 2021-04-01
description Abstract Objective Our goal was to research and develop exploratory analysis tools for clinical notes, which now are underrepresented to limit the diversity of data insights on medically relevant applications. Results We characterize how exploratory analysis can affect representation learning on clinical narratives and present several self-developed tools to explore sepsis. Our experiments focus on patients with sepsis in the MIMIC-III Clinical Database or in our institution’s research patient data repository. We found that global embeddings assist in learning local representations of clinical notes. Second, aligning at any specific time facilitates the use of learning models by pooling more available clinical notes to form a training set. Furthermore, reconstruction of the timeline enhances downstream-processing techniques by emphasizing temporal expressions and temporal relationships in clinical documentation. We demonstrate that clustering helps plot various types of clinical notes against a scale, which conveys a sense of the range or spread of the data and is useful for understanding data correlations. Appropriate exploratory analysis tools provide keen insights into preprocessing clinical notes, thereby further enhancing downstream analysis capabilities, making data driven medicine possible. Our examples can help generate better data representation of clinical documentation for models with improved performance and interpretability.
topic Sepsis
Representation learning
Exploratory analysis
Data driven medicine
url https://doi.org/10.1186/s13104-021-05529-4
work_keys_str_mv AT xudongzhu embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT josephmplasek embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT chunleitang embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT wasimalassad embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT zhikunzhang embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT yunxiong embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT liqinwang embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT sharmithayerneni embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT carlosortega embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT minjeoungkang embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT lizhou embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT davidwbates embeddingaligningandreconstructingclinicalnotestoexploresepsis
AT patriciacdykes embeddingaligningandreconstructingclinicalnotestoexploresepsis
_version_ 1721522028792512512