Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations.

Electronic health records (EHRs) contain rich documentation regarding disease symptoms and progression, but EHR data is challenging to use for diagnosis prediction due to its high dimensionality, relative scarcity, and substantial level of noise. We investigated how to best represent EHR data for pr...

Full description

Bibliographic Details
Main Authors:	Rebecka Weegar, Karin Sundström
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2020-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0237911

id	doaj-25146c0741594b09873adca29c9cc8bd
record_format	Article
spelling	doaj-25146c0741594b09873adca29c9cc8bd2021-03-03T22:05:03ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01158e023791110.1371/journal.pone.0237911Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations.Rebecka WeegarKarin SundströmElectronic health records (EHRs) contain rich documentation regarding disease symptoms and progression, but EHR data is challenging to use for diagnosis prediction due to its high dimensionality, relative scarcity, and substantial level of noise. We investigated how to best represent EHR data for predicting cervical cancer, a serious disease where early detection is beneficial for the outcome of treatment. A case group of 1321 patients with cervical cancer were matched to ten times as many controls, and for both groups several types of events were extracted from their EHRs. These events included clinical codes, lab results, and contents of free text notes retrieved using a LSTM neural network. Clinical events are described with great variation in EHR texts, leading to a very large feature space. Therefore, an event hierarchy inferred from the textual events was created to represent the clinical texts. Overall, the events extracted from free text notes contributed the most to the final prediction, and the hierarchy of textual events further improved performance. Four classifiers were evaluated for predicting a future cancer diagnosis where Random Forest achieved the best results with an AUC of 0.70 from a year before diagnosis up to 0.97 one day before diagnosis. We conclude that our approach is sound and had excellent discrimination at diagnosis, but only modest discrimination capacity before this point. Since our study objective was earlier disease prediction than such, we propose further work should consider extending patient histories through e.g. the integration of primary health records preceding referral to hospital.https://doi.org/10.1371/journal.pone.0237911
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Rebecka Weegar Karin Sundström
spellingShingle	Rebecka Weegar Karin Sundström Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations. PLoS ONE
author_facet	Rebecka Weegar Karin Sundström
author_sort	Rebecka Weegar
title	Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations.
title_short	Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations.
title_full	Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations.
title_fullStr	Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations.
title_full_unstemmed	Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations.
title_sort	using machine learning for predicting cervical cancer from swedish electronic health records by mining hierarchical representations.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2020-01-01
description	Electronic health records (EHRs) contain rich documentation regarding disease symptoms and progression, but EHR data is challenging to use for diagnosis prediction due to its high dimensionality, relative scarcity, and substantial level of noise. We investigated how to best represent EHR data for predicting cervical cancer, a serious disease where early detection is beneficial for the outcome of treatment. A case group of 1321 patients with cervical cancer were matched to ten times as many controls, and for both groups several types of events were extracted from their EHRs. These events included clinical codes, lab results, and contents of free text notes retrieved using a LSTM neural network. Clinical events are described with great variation in EHR texts, leading to a very large feature space. Therefore, an event hierarchy inferred from the textual events was created to represent the clinical texts. Overall, the events extracted from free text notes contributed the most to the final prediction, and the hierarchy of textual events further improved performance. Four classifiers were evaluated for predicting a future cancer diagnosis where Random Forest achieved the best results with an AUC of 0.70 from a year before diagnosis up to 0.97 one day before diagnosis. We conclude that our approach is sound and had excellent discrimination at diagnosis, but only modest discrimination capacity before this point. Since our study objective was earlier disease prediction than such, we propose further work should consider extending patient histories through e.g. the integration of primary health records preceding referral to hospital.
url	https://doi.org/10.1371/journal.pone.0237911
work_keys_str_mv	AT rebeckaweegar usingmachinelearningforpredictingcervicalcancerfromswedishelectronichealthrecordsbymininghierarchicalrepresentations AT karinsundstrom usingmachinelearningforpredictingcervicalcancerfromswedishelectronichealthrecordsbymininghierarchicalrepresentations
_version_	1714813435542241280

Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations.

Similar Items