MASK: A Success Story for An International Collaboration

Introduction A significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in d...

Full description

Bibliographic Details
Main Authors: Mahmoud Azimaee, Gangamma Kalappa, Nikola Milosevic, Goran Nenadic, Hesam Dadafarin, Mahshid Yassaie, Branson Chen, Sean Ji, Daniella Barron, Elisa Candido, Cheng Qian, Marian Vermeulen
Format: Article
Language:English
Published: Swansea University 2020-12-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/1621
id doaj-7212312305bc4ddea2ffb536cc04dc1a
record_format Article
spelling doaj-7212312305bc4ddea2ffb536cc04dc1a2021-02-10T16:41:52ZengSwansea UniversityInternational Journal of Population Data Science2399-49082020-12-015510.23889/ijpds.v5i5.1621MASK: A Success Story for An International CollaborationMahmoud Azimaee0Gangamma Kalappa1Nikola Milosevic2Goran Nenadic3Hesam Dadafarin4Mahshid Yassaie5Branson Chen6Sean Ji7Daniella Barron8Elisa Candido9Cheng Qian10Marian Vermeulen11ICESICESUniversity of ManchesterUniversity of ManchesterEvensetEvensetICESICESICESICESICESICES Introduction A significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in different jurisdictions requires de-identification of this information prior to making it available for research. This process can be challenging and time-consuming. In particular, rule-based algorithms may lead to over-masking of essential medical terms, conditions, or devices that are named after individuals. Objectives and Approach We aimed to enhance ICES’ existing rule-based application to make it contextually-driven by applying Artificial Intelligence (AI). The ICES team collaborated with computer scientists at the University of Manchester who had already published work in this area and Evenset, a Toronto-based software company. Based on the Manchester University de-identification framework for name entity recognition, three machine learning-based algorithms for name entity recognition were implemented: CRF, BiLSTM recurrent neural networks with GLoVe and ELMo word embeddings. The models were trained on three different types of ICES data: Laboratory results, Electronic Medical Record (EMR) and echocardiogram data. Evenset developed the user interface and the masking modules. Results Preliminary tests have generated very promising results. To improve accuracy of the models, additional data annotation to expand the training datasets is currently being undertaken at ICES. The final framework will be available as an open-source tool for public. Conclusion / Implications A collaborative approach for solving complex problems like de-identification of text-based medical data is highly efficient, especially where there are unique sets of expertise, resources, data and clinical knowledge among stakeholders. https://ijpds.org/article/view/1621
collection DOAJ
language English
format Article
sources DOAJ
author Mahmoud Azimaee
Gangamma Kalappa
Nikola Milosevic
Goran Nenadic
Hesam Dadafarin
Mahshid Yassaie
Branson Chen
Sean Ji
Daniella Barron
Elisa Candido
Cheng Qian
Marian Vermeulen
spellingShingle Mahmoud Azimaee
Gangamma Kalappa
Nikola Milosevic
Goran Nenadic
Hesam Dadafarin
Mahshid Yassaie
Branson Chen
Sean Ji
Daniella Barron
Elisa Candido
Cheng Qian
Marian Vermeulen
MASK: A Success Story for An International Collaboration
International Journal of Population Data Science
author_facet Mahmoud Azimaee
Gangamma Kalappa
Nikola Milosevic
Goran Nenadic
Hesam Dadafarin
Mahshid Yassaie
Branson Chen
Sean Ji
Daniella Barron
Elisa Candido
Cheng Qian
Marian Vermeulen
author_sort Mahmoud Azimaee
title MASK: A Success Story for An International Collaboration
title_short MASK: A Success Story for An International Collaboration
title_full MASK: A Success Story for An International Collaboration
title_fullStr MASK: A Success Story for An International Collaboration
title_full_unstemmed MASK: A Success Story for An International Collaboration
title_sort mask: a success story for an international collaboration
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2020-12-01
description Introduction A significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in different jurisdictions requires de-identification of this information prior to making it available for research. This process can be challenging and time-consuming. In particular, rule-based algorithms may lead to over-masking of essential medical terms, conditions, or devices that are named after individuals. Objectives and Approach We aimed to enhance ICES’ existing rule-based application to make it contextually-driven by applying Artificial Intelligence (AI). The ICES team collaborated with computer scientists at the University of Manchester who had already published work in this area and Evenset, a Toronto-based software company. Based on the Manchester University de-identification framework for name entity recognition, three machine learning-based algorithms for name entity recognition were implemented: CRF, BiLSTM recurrent neural networks with GLoVe and ELMo word embeddings. The models were trained on three different types of ICES data: Laboratory results, Electronic Medical Record (EMR) and echocardiogram data. Evenset developed the user interface and the masking modules. Results Preliminary tests have generated very promising results. To improve accuracy of the models, additional data annotation to expand the training datasets is currently being undertaken at ICES. The final framework will be available as an open-source tool for public. Conclusion / Implications A collaborative approach for solving complex problems like de-identification of text-based medical data is highly efficient, especially where there are unique sets of expertise, resources, data and clinical knowledge among stakeholders.
url https://ijpds.org/article/view/1621
work_keys_str_mv AT mahmoudazimaee maskasuccessstoryforaninternationalcollaboration
AT gangammakalappa maskasuccessstoryforaninternationalcollaboration
AT nikolamilosevic maskasuccessstoryforaninternationalcollaboration
AT gorannenadic maskasuccessstoryforaninternationalcollaboration
AT hesamdadafarin maskasuccessstoryforaninternationalcollaboration
AT mahshidyassaie maskasuccessstoryforaninternationalcollaboration
AT bransonchen maskasuccessstoryforaninternationalcollaboration
AT seanji maskasuccessstoryforaninternationalcollaboration
AT daniellabarron maskasuccessstoryforaninternationalcollaboration
AT elisacandido maskasuccessstoryforaninternationalcollaboration
AT chengqian maskasuccessstoryforaninternationalcollaboration
AT marianvermeulen maskasuccessstoryforaninternationalcollaboration
_version_ 1724275174886342656