Information Extraction from Pathology Report for Cancer Registry

碩士 === 慈濟大學 === 醫學資訊研究所 === 97 === Cancer registry, which is important for medical research, is the process to extract important information from cancer patient’s medical report and code it into structured column. Because the major part of medical report is written in unstructured free-text, it’s us...

Full description

Bibliographic Details
Main Authors: Hui-Chen Hu, 胡惠珍
Other Authors: Wen-Cheng Lin
Format: Others
Language:zh-TW
Online Access:http://ndltd.ncl.edu.tw/handle/29591764407399847700
id ndltd-TW-097TCU05674001
record_format oai_dc
spelling ndltd-TW-097TCU056740012015-10-13T14:49:20Z http://ndltd.ncl.edu.tw/handle/29591764407399847700 Information Extraction from Pathology Report for Cancer Registry 輔助癌症登記之病理報告資訊擷取系統 Hui-Chen Hu 胡惠珍 碩士 慈濟大學 醫學資訊研究所 97 Cancer registry, which is important for medical research, is the process to extract important information from cancer patient’s medical report and code it into structured column. Because the major part of medical report is written in unstructured free-text, it’s usually coded manually by trained personnel according to some rules, which is time consuming. In this study, we developed a system to extract information from pathology report for cancer registry. We use MedLEE, a natural language processor, to convert a preprocessed pathology report into semi-structured, coded XML format. We constructed a “cancer registry rule management platform” to help cancer registrars coding their knowledge into registry rules, speed up the establishment of rules, and advance the quality of rules. After the user keyin new registry rule on the platform, the “Rule-XQuery converter” converts the rule into XQuery. After some modification by programmers, the XQuery could perform good accuracy. For a few variables which can’t be extracted by XQuery, we established a “Java information extraction module” to extract their values from pathology report by other methods such as keyword search. By using the converter and Java module, our system can quickly establish extraction rules for another cancer. After the extraction rules are established, the “post-processor” uses the XQueries and Java programs to extract the values of variables from the XML file or the pathology report. Furthermore, we provide a “registry validation interface” for users to conveniently validate the correctness of the extracted results. During validating, if the user finds there are mistakes in the rule, he/she could go back to the “cancer registry rule management platform” to modify the rule. In this study, we established three cancer registry rule sets for oral cavity carcinoma, hepatocellular carcinoma, and colorectal carcinoma. We compare the extracted results of our system with the manually coded cancer registries. The XQuery of oral cavity carcinoma is written by one programmer. The rules of hepatocellular carcinoma and colorectal carcinoma were established using the “cancer registry rule management platform”. After some modification by the programmer, the rules perform good accuracy rate, i.e. 93.15%, 92.11% and 90.98%, respectively. One registrar who used our system reported that she is satisfied with the performance of our system. The extracted cancer registry could serve as a “draft’ for final registry and save cancer registrars’ time and effort. Wen-Cheng Lin 林紋正 學位論文 ; thesis 51 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 慈濟大學 === 醫學資訊研究所 === 97 === Cancer registry, which is important for medical research, is the process to extract important information from cancer patient’s medical report and code it into structured column. Because the major part of medical report is written in unstructured free-text, it’s usually coded manually by trained personnel according to some rules, which is time consuming. In this study, we developed a system to extract information from pathology report for cancer registry. We use MedLEE, a natural language processor, to convert a preprocessed pathology report into semi-structured, coded XML format. We constructed a “cancer registry rule management platform” to help cancer registrars coding their knowledge into registry rules, speed up the establishment of rules, and advance the quality of rules. After the user keyin new registry rule on the platform, the “Rule-XQuery converter” converts the rule into XQuery. After some modification by programmers, the XQuery could perform good accuracy. For a few variables which can’t be extracted by XQuery, we established a “Java information extraction module” to extract their values from pathology report by other methods such as keyword search. By using the converter and Java module, our system can quickly establish extraction rules for another cancer. After the extraction rules are established, the “post-processor” uses the XQueries and Java programs to extract the values of variables from the XML file or the pathology report. Furthermore, we provide a “registry validation interface” for users to conveniently validate the correctness of the extracted results. During validating, if the user finds there are mistakes in the rule, he/she could go back to the “cancer registry rule management platform” to modify the rule. In this study, we established three cancer registry rule sets for oral cavity carcinoma, hepatocellular carcinoma, and colorectal carcinoma. We compare the extracted results of our system with the manually coded cancer registries. The XQuery of oral cavity carcinoma is written by one programmer. The rules of hepatocellular carcinoma and colorectal carcinoma were established using the “cancer registry rule management platform”. After some modification by the programmer, the rules perform good accuracy rate, i.e. 93.15%, 92.11% and 90.98%, respectively. One registrar who used our system reported that she is satisfied with the performance of our system. The extracted cancer registry could serve as a “draft’ for final registry and save cancer registrars’ time and effort.
author2 Wen-Cheng Lin
author_facet Wen-Cheng Lin
Hui-Chen Hu
胡惠珍
author Hui-Chen Hu
胡惠珍
spellingShingle Hui-Chen Hu
胡惠珍
Information Extraction from Pathology Report for Cancer Registry
author_sort Hui-Chen Hu
title Information Extraction from Pathology Report for Cancer Registry
title_short Information Extraction from Pathology Report for Cancer Registry
title_full Information Extraction from Pathology Report for Cancer Registry
title_fullStr Information Extraction from Pathology Report for Cancer Registry
title_full_unstemmed Information Extraction from Pathology Report for Cancer Registry
title_sort information extraction from pathology report for cancer registry
url http://ndltd.ncl.edu.tw/handle/29591764407399847700
work_keys_str_mv AT huichenhu informationextractionfrompathologyreportforcancerregistry
AT húhuìzhēn informationextractionfrompathologyreportforcancerregistry
AT huichenhu fǔzhùáizhèngdēngjìzhībìnglǐbàogàozīxùnxiéqǔxìtǒng
AT húhuìzhēn fǔzhùáizhèngdēngjìzhībìnglǐbàogàozīxùnxiéqǔxìtǒng
_version_ 1717758517418393600