MASK: A Success Story for An International Collaboration
Introduction A significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in d...
Main Authors: | , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2020-12-01
|
Series: | International Journal of Population Data Science |
Online Access: | https://ijpds.org/article/view/1621 |
id |
doaj-7212312305bc4ddea2ffb536cc04dc1a |
---|---|
record_format |
Article |
spelling |
doaj-7212312305bc4ddea2ffb536cc04dc1a2021-02-10T16:41:52ZengSwansea UniversityInternational Journal of Population Data Science2399-49082020-12-015510.23889/ijpds.v5i5.1621MASK: A Success Story for An International CollaborationMahmoud Azimaee0Gangamma Kalappa1Nikola Milosevic2Goran Nenadic3Hesam Dadafarin4Mahshid Yassaie5Branson Chen6Sean Ji7Daniella Barron8Elisa Candido9Cheng Qian10Marian Vermeulen11ICESICESUniversity of ManchesterUniversity of ManchesterEvensetEvensetICESICESICESICESICESICES Introduction A significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in different jurisdictions requires de-identification of this information prior to making it available for research. This process can be challenging and time-consuming. In particular, rule-based algorithms may lead to over-masking of essential medical terms, conditions, or devices that are named after individuals. Objectives and Approach We aimed to enhance ICES’ existing rule-based application to make it contextually-driven by applying Artificial Intelligence (AI). The ICES team collaborated with computer scientists at the University of Manchester who had already published work in this area and Evenset, a Toronto-based software company. Based on the Manchester University de-identification framework for name entity recognition, three machine learning-based algorithms for name entity recognition were implemented: CRF, BiLSTM recurrent neural networks with GLoVe and ELMo word embeddings. The models were trained on three different types of ICES data: Laboratory results, Electronic Medical Record (EMR) and echocardiogram data. Evenset developed the user interface and the masking modules. Results Preliminary tests have generated very promising results. To improve accuracy of the models, additional data annotation to expand the training datasets is currently being undertaken at ICES. The final framework will be available as an open-source tool for public. Conclusion / Implications A collaborative approach for solving complex problems like de-identification of text-based medical data is highly efficient, especially where there are unique sets of expertise, resources, data and clinical knowledge among stakeholders. https://ijpds.org/article/view/1621 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Mahmoud Azimaee Gangamma Kalappa Nikola Milosevic Goran Nenadic Hesam Dadafarin Mahshid Yassaie Branson Chen Sean Ji Daniella Barron Elisa Candido Cheng Qian Marian Vermeulen |
spellingShingle |
Mahmoud Azimaee Gangamma Kalappa Nikola Milosevic Goran Nenadic Hesam Dadafarin Mahshid Yassaie Branson Chen Sean Ji Daniella Barron Elisa Candido Cheng Qian Marian Vermeulen MASK: A Success Story for An International Collaboration International Journal of Population Data Science |
author_facet |
Mahmoud Azimaee Gangamma Kalappa Nikola Milosevic Goran Nenadic Hesam Dadafarin Mahshid Yassaie Branson Chen Sean Ji Daniella Barron Elisa Candido Cheng Qian Marian Vermeulen |
author_sort |
Mahmoud Azimaee |
title |
MASK: A Success Story for An International Collaboration |
title_short |
MASK: A Success Story for An International Collaboration |
title_full |
MASK: A Success Story for An International Collaboration |
title_fullStr |
MASK: A Success Story for An International Collaboration |
title_full_unstemmed |
MASK: A Success Story for An International Collaboration |
title_sort |
mask: a success story for an international collaboration |
publisher |
Swansea University |
series |
International Journal of Population Data Science |
issn |
2399-4908 |
publishDate |
2020-12-01 |
description |
Introduction
A significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in different jurisdictions requires de-identification of this information prior to making it available for research. This process can be challenging and time-consuming. In particular, rule-based algorithms may lead to over-masking of essential medical terms, conditions, or devices that are named after individuals.
Objectives and Approach
We aimed to enhance ICES’ existing rule-based application to make it contextually-driven by applying Artificial Intelligence (AI). The ICES team collaborated with computer scientists at the University of Manchester who had already published work in this area and Evenset, a Toronto-based software company. Based on the Manchester University de-identification framework for name entity recognition, three machine learning-based algorithms for name entity recognition were implemented: CRF, BiLSTM recurrent neural networks with GLoVe and ELMo word embeddings. The models were trained on three different types of ICES data: Laboratory results, Electronic Medical Record (EMR) and echocardiogram data. Evenset developed the user interface and the masking modules.
Results
Preliminary tests have generated very promising results. To improve accuracy of the models, additional data annotation to expand the training datasets is currently being undertaken at ICES. The final framework will be available as an open-source tool for public.
Conclusion / Implications
A collaborative approach for solving complex problems like de-identification of text-based medical data is highly efficient, especially where there are unique sets of expertise, resources, data and clinical knowledge among stakeholders.
|
url |
https://ijpds.org/article/view/1621 |
work_keys_str_mv |
AT mahmoudazimaee maskasuccessstoryforaninternationalcollaboration AT gangammakalappa maskasuccessstoryforaninternationalcollaboration AT nikolamilosevic maskasuccessstoryforaninternationalcollaboration AT gorannenadic maskasuccessstoryforaninternationalcollaboration AT hesamdadafarin maskasuccessstoryforaninternationalcollaboration AT mahshidyassaie maskasuccessstoryforaninternationalcollaboration AT bransonchen maskasuccessstoryforaninternationalcollaboration AT seanji maskasuccessstoryforaninternationalcollaboration AT daniellabarron maskasuccessstoryforaninternationalcollaboration AT elisacandido maskasuccessstoryforaninternationalcollaboration AT chengqian maskasuccessstoryforaninternationalcollaboration AT marianvermeulen maskasuccessstoryforaninternationalcollaboration |
_version_ |
1724275174886342656 |