Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing
Abstract. Background:. Structured reports are not widely used and thus most reports exist in the form of free text. The process of data extraction by experts is time-consuming and error-prone, whereas data extraction by natural language processing (NLP) is a potential solution that could improve dia...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wolters Kluwer
2019-07-01
|
Series: | Chinese Medical Journal |
Online Access: | http://journals.lww.com/10.1097/CM9.0000000000000301 |
id |
doaj-f4e200fda72448478b5361e3a1d13497 |
---|---|
record_format |
Article |
spelling |
doaj-f4e200fda72448478b5361e3a1d134972020-12-02T07:49:33ZengWolters KluwerChinese Medical Journal0366-69992542-56412019-07-01132141673168010.1097/CM9.0000000000000301201907200-00006Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processingYi Liu0Li-Na Zhu1Qing Liu2Chao Han3Xiao-Dong Zhang4Xiao-Ying Wang5Peng Lyu6Department of Radiology, Peking University First Hospital, Beijing 100034, China.Department of Radiology, Peking University First Hospital, Beijing 100034, China.Department of Radiology, Peking University First Hospital, Beijing 100034, China.Department of Radiology, Peking University First Hospital, Beijing 100034, China.Department of Radiology, Peking University First Hospital, Beijing 100034, China.Department of Radiology, Peking University First Hospital, Beijing 100034, China.Department of Radiology, Peking University First Hospital, Beijing 100034, China.Abstract. Background:. Structured reports are not widely used and thus most reports exist in the form of free text. The process of data extraction by experts is time-consuming and error-prone, whereas data extraction by natural language processing (NLP) is a potential solution that could improve diagnosis efficiency and accuracy. The purpose of this study was to evaluate an NLP program that determines American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) descriptors and final assessment categories from breast magnetic resonance imaging (MRI) reports. Methods:. This cross-sectional study involved 2330 breast MRI reports in the electronic medical record from 2009 to 2017. We used 1635 reports for the creation of a revised BI-RADS MRI lexicon and synonyms lists as well as the iterative development of an NLP system. The remaining 695 reports that were not used for developing the system were used as an independent test set for the final evaluation of the NLP system. The recall and precision of an NLP algorithm to detect the revised BI-RADS MRI descriptors and BI-RADS categories from the free-text reports were evaluated against a standard reference of manual human review. Results:. There was a high level of agreement between two manual reviewers, with a κ value of 0.95. For all breast imaging reports, the NLP algorithm demonstrated a recall of 78.5% and a precision of 86.1% for correct identification of the revised BI-RADS MRI descriptors and the BI-RADS categories. NLP generated the total results in <1 s, whereas the manual reviewers averaged 3.38 and 3.23 min per report, respectively. Conclusions:. The NLP algorithm demonstrates high recall and precision for information extraction from free-text reports. This approach will help to narrow the gap between unstructured report text and structured data, which is needed in decision support and other applications.http://journals.lww.com/10.1097/CM9.0000000000000301 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yi Liu Li-Na Zhu Qing Liu Chao Han Xiao-Dong Zhang Xiao-Ying Wang Peng Lyu |
spellingShingle |
Yi Liu Li-Na Zhu Qing Liu Chao Han Xiao-Dong Zhang Xiao-Ying Wang Peng Lyu Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing Chinese Medical Journal |
author_facet |
Yi Liu Li-Na Zhu Qing Liu Chao Han Xiao-Dong Zhang Xiao-Ying Wang Peng Lyu |
author_sort |
Yi Liu |
title |
Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing |
title_short |
Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing |
title_full |
Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing |
title_fullStr |
Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing |
title_full_unstemmed |
Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing |
title_sort |
automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing |
publisher |
Wolters Kluwer |
series |
Chinese Medical Journal |
issn |
0366-6999 2542-5641 |
publishDate |
2019-07-01 |
description |
Abstract. Background:. Structured reports are not widely used and thus most reports exist in the form of free text. The process of data extraction by experts is time-consuming and error-prone, whereas data extraction by natural language processing (NLP) is a potential solution that could improve diagnosis efficiency and accuracy. The purpose of this study was to evaluate an NLP program that determines American College of Radiology Breast Imaging Reporting and Data System (BI-RADS) descriptors and final assessment categories from breast magnetic resonance imaging (MRI) reports.
Methods:. This cross-sectional study involved 2330 breast MRI reports in the electronic medical record from 2009 to 2017. We used 1635 reports for the creation of a revised BI-RADS MRI lexicon and synonyms lists as well as the iterative development of an NLP system. The remaining 695 reports that were not used for developing the system were used as an independent test set for the final evaluation of the NLP system. The recall and precision of an NLP algorithm to detect the revised BI-RADS MRI descriptors and BI-RADS categories from the free-text reports were evaluated against a standard reference of manual human review.
Results:. There was a high level of agreement between two manual reviewers, with a κ value of 0.95. For all breast imaging reports, the NLP algorithm demonstrated a recall of 78.5% and a precision of 86.1% for correct identification of the revised BI-RADS MRI descriptors and the BI-RADS categories. NLP generated the total results in <1 s, whereas the manual reviewers averaged 3.38 and 3.23 min per report, respectively.
Conclusions:. The NLP algorithm demonstrates high recall and precision for information extraction from free-text reports. This approach will help to narrow the gap between unstructured report text and structured data, which is needed in decision support and other applications. |
url |
http://journals.lww.com/10.1097/CM9.0000000000000301 |
work_keys_str_mv |
AT yiliu automaticextractionofimagingobservationandassessmentcategoriesfrombreastmagneticresonanceimagingreportswithnaturallanguageprocessing AT linazhu automaticextractionofimagingobservationandassessmentcategoriesfrombreastmagneticresonanceimagingreportswithnaturallanguageprocessing AT qingliu automaticextractionofimagingobservationandassessmentcategoriesfrombreastmagneticresonanceimagingreportswithnaturallanguageprocessing AT chaohan automaticextractionofimagingobservationandassessmentcategoriesfrombreastmagneticresonanceimagingreportswithnaturallanguageprocessing AT xiaodongzhang automaticextractionofimagingobservationandassessmentcategoriesfrombreastmagneticresonanceimagingreportswithnaturallanguageprocessing AT xiaoyingwang automaticextractionofimagingobservationandassessmentcategoriesfrombreastmagneticresonanceimagingreportswithnaturallanguageprocessing AT penglyu automaticextractionofimagingobservationandassessmentcategoriesfrombreastmagneticresonanceimagingreportswithnaturallanguageprocessing |
_version_ |
1724408179681394688 |