MASK: A Success Story for An International Collaboration

Introduction A significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in d...

Full description

Bibliographic Details
Main Authors: Mahmoud Azimaee, Gangamma Kalappa, Nikola Milosevic, Goran Nenadic, Hesam Dadafarin, Mahshid Yassaie, Branson Chen, Sean Ji, Daniella Barron, Elisa Candido, Cheng Qian, Marian Vermeulen
Format: Article
Language:English
Published: Swansea University 2020-12-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/1621
Description
Summary:Introduction A significant amount of valuable information in Electronic Health Records (EHR) such as laboratory test results or echocardiogram interpretations is embedded in lengthy free-text fields. Often patients’ personal information is also included in these narratives. Privacy legislation in different jurisdictions requires de-identification of this information prior to making it available for research. This process can be challenging and time-consuming. In particular, rule-based algorithms may lead to over-masking of essential medical terms, conditions, or devices that are named after individuals. Objectives and Approach We aimed to enhance ICES’ existing rule-based application to make it contextually-driven by applying Artificial Intelligence (AI). The ICES team collaborated with computer scientists at the University of Manchester who had already published work in this area and Evenset, a Toronto-based software company. Based on the Manchester University de-identification framework for name entity recognition, three machine learning-based algorithms for name entity recognition were implemented: CRF, BiLSTM recurrent neural networks with GLoVe and ELMo word embeddings. The models were trained on three different types of ICES data: Laboratory results, Electronic Medical Record (EMR) and echocardiogram data. Evenset developed the user interface and the masking modules. Results Preliminary tests have generated very promising results. To improve accuracy of the models, additional data annotation to expand the training datasets is currently being undertaken at ICES. The final framework will be available as an open-source tool for public. Conclusion / Implications A collaborative approach for solving complex problems like de-identification of text-based medical data is highly efficient, especially where there are unique sets of expertise, resources, data and clinical knowledge among stakeholders.
ISSN:2399-4908