Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis

BackgroundThe prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. ObjectiveThe aim of this...

Full description

Bibliographic Details
Main Authors:	Rybinski, Maciej, Dai, Xiang, Singh, Sonit, Karimi, Sarvnaz, Nguyen, Anthony
Format:	Article
Language:	English
Published:	JMIR Publications 2021-04-01
Series:	JMIR Medical Informatics
Online Access:	https://medinform.jmir.org/2021/4/e24020

id	doaj-227a6dc30f4a4254ad1a3053f79f318d
record_format	Article
spelling	doaj-227a6dc30f4a4254ad1a3053f79f318d2021-04-30T15:01:20ZengJMIR PublicationsJMIR Medical Informatics2291-96942021-04-0194e2402010.2196/24020Extracting Family History Information From Electronic Health Records: Natural Language Processing AnalysisRybinski, MaciejDai, XiangSingh, SonitKarimi, SarvnazNguyen, Anthony BackgroundThe prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. ObjectiveThe aim of this study is to develop automated methods that enable access to FH data through natural language processing. MethodsWe performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. ResultsOur experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. ConclusionsOur approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.https://medinform.jmir.org/2021/4/e24020
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Rybinski, Maciej Dai, Xiang Singh, Sonit Karimi, Sarvnaz Nguyen, Anthony
spellingShingle	Rybinski, Maciej Dai, Xiang Singh, Sonit Karimi, Sarvnaz Nguyen, Anthony Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis JMIR Medical Informatics
author_facet	Rybinski, Maciej Dai, Xiang Singh, Sonit Karimi, Sarvnaz Nguyen, Anthony
author_sort	Rybinski, Maciej
title	Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis
title_short	Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis
title_full	Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis
title_fullStr	Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis
title_full_unstemmed	Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis
title_sort	extracting family history information from electronic health records: natural language processing analysis
publisher	JMIR Publications
series	JMIR Medical Informatics
issn	2291-9694
publishDate	2021-04-01
description	BackgroundThe prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. ObjectiveThe aim of this study is to develop automated methods that enable access to FH data through natural language processing. MethodsWe performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. ResultsOur experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. ConclusionsOur approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.
url	https://medinform.jmir.org/2021/4/e24020
work_keys_str_mv	AT rybinskimaciej extractingfamilyhistoryinformationfromelectronichealthrecordsnaturallanguageprocessinganalysis AT daixiang extractingfamilyhistoryinformationfromelectronichealthrecordsnaturallanguageprocessinganalysis AT singhsonit extractingfamilyhistoryinformationfromelectronichealthrecordsnaturallanguageprocessinganalysis AT karimisarvnaz extractingfamilyhistoryinformationfromelectronichealthrecordsnaturallanguageprocessinganalysis AT nguyenanthony extractingfamilyhistoryinformationfromelectronichealthrecordsnaturallanguageprocessinganalysis
_version_	1721497716212629504

Extracting Family History Information From Electronic Health Records: Natural Language Processing Analysis

Similar Items