Information Extraction from Pathology Report for Cancer Registry

碩士 === 慈濟大學 === 醫學資訊研究所 === 97 === Cancer registry, which is important for medical research, is the process to extract important information from cancer patient’s medical report and code it into structured column. Because the major part of medical report is written in unstructured free-text, it’s us...

Full description

Bibliographic Details
Main Authors:	Hui-Chen Hu, 胡惠珍
Other Authors:	Wen-Cheng Lin
Format:	Others
Language:	zh-TW
Online Access:	http://ndltd.ncl.edu.tw/handle/29591764407399847700

id	ndltd-TW-097TCU05674001
record_format	oai_dc
spelling	ndltd-TW-097TCU056740012015-10-13T14:49:20Z http://ndltd.ncl.edu.tw/handle/29591764407399847700 Information Extraction from Pathology Report for Cancer Registry 輔助癌症登記之病理報告資訊擷取系統 Hui-Chen Hu 胡惠珍碩士慈濟大學醫學資訊研究所 97 Cancer registry, which is important for medical research, is the process to extract important information from cancer patient’s medical report and code it into structured column. Because the major part of medical report is written in unstructured free-text, it’s usually coded manually by trained personnel according to some rules, which is time consuming. In this study, we developed a system to extract information from pathology report for cancer registry. We use MedLEE, a natural language processor, to convert a preprocessed pathology report into semi-structured, coded XML format. We constructed a “cancer registry rule management platform” to help cancer registrars coding their knowledge into registry rules, speed up the establishment of rules, and advance the quality of rules. After the user keyin new registry rule on the platform, the “Rule-XQuery converter” converts the rule into XQuery. After some modification by programmers, the XQuery could perform good accuracy. For a few variables which can’t be extracted by XQuery, we established a “Java information extraction module” to extract their values from pathology report by other methods such as keyword search. By using the converter and Java module, our system can quickly establish extraction rules for another cancer. After the extraction rules are established, the “post-processor” uses the XQueries and Java programs to extract the values of variables from the XML file or the pathology report. Furthermore, we provide a “registry validation interface” for users to conveniently validate the correctness of the extracted results. During validating, if the user finds there are mistakes in the rule, he/she could go back to the “cancer registry rule management platform” to modify the rule. In this study, we established three cancer registry rule sets for oral cavity carcinoma, hepatocellular carcinoma, and colorectal carcinoma. We compare the extracted results of our system with the manually coded cancer registries. The XQuery of oral cavity carcinoma is written by one programmer. The rules of hepatocellular carcinoma and colorectal carcinoma were established using the “cancer registry rule management platform”. After some modification by the programmer, the rules perform good accuracy rate, i.e. 93.15%, 92.11% and 90.98%, respectively. One registrar who used our system reported that she is satisfied with the performance of our system. The extracted cancer registry could serve as a “draft’ for final registry and save cancer registrars’ time and effort. Wen-Cheng Lin 林紋正學位論文 ; thesis 51 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 慈濟大學 === 醫學資訊研究所 === 97 === Cancer registry, which is important for medical research, is the process to extract important information from cancer patient’s medical report and code it into structured column. Because the major part of medical report is written in unstructured free-text, it’s usually coded manually by trained personnel according to some rules, which is time consuming. In this study, we developed a system to extract information from pathology report for cancer registry. We use MedLEE, a natural language processor, to convert a preprocessed pathology report into semi-structured, coded XML format. We constructed a “cancer registry rule management platform” to help cancer registrars coding their knowledge into registry rules, speed up the establishment of rules, and advance the quality of rules. After the user keyin new registry rule on the platform, the “Rule-XQuery converter” converts the rule into XQuery. After some modification by programmers, the XQuery could perform good accuracy. For a few variables which can’t be extracted by XQuery, we established a “Java information extraction module” to extract their values from pathology report by other methods such as keyword search. By using the converter and Java module, our system can quickly establish extraction rules for another cancer. After the extraction rules are established, the “post-processor” uses the XQueries and Java programs to extract the values of variables from the XML file or the pathology report. Furthermore, we provide a “registry validation interface” for users to conveniently validate the correctness of the extracted results. During validating, if the user finds there are mistakes in the rule, he/she could go back to the “cancer registry rule management platform” to modify the rule. In this study, we established three cancer registry rule sets for oral cavity carcinoma, hepatocellular carcinoma, and colorectal carcinoma. We compare the extracted results of our system with the manually coded cancer registries. The XQuery of oral cavity carcinoma is written by one programmer. The rules of hepatocellular carcinoma and colorectal carcinoma were established using the “cancer registry rule management platform”. After some modification by the programmer, the rules perform good accuracy rate, i.e. 93.15%, 92.11% and 90.98%, respectively. One registrar who used our system reported that she is satisfied with the performance of our system. The extracted cancer registry could serve as a “draft’ for final registry and save cancer registrars’ time and effort.
author2	Wen-Cheng Lin
author_facet	Wen-Cheng Lin Hui-Chen Hu 胡惠珍
author	Hui-Chen Hu 胡惠珍
spellingShingle	Hui-Chen Hu 胡惠珍 Information Extraction from Pathology Report for Cancer Registry
author_sort	Hui-Chen Hu
title	Information Extraction from Pathology Report for Cancer Registry
title_short	Information Extraction from Pathology Report for Cancer Registry
title_full	Information Extraction from Pathology Report for Cancer Registry
title_fullStr	Information Extraction from Pathology Report for Cancer Registry
title_full_unstemmed	Information Extraction from Pathology Report for Cancer Registry
title_sort	information extraction from pathology report for cancer registry
url	http://ndltd.ncl.edu.tw/handle/29591764407399847700
work_keys_str_mv	AT huichenhu informationextractionfrompathologyreportforcancerregistry AT húhuìzhēn informationextractionfrompathologyreportforcancerregistry AT huichenhu fǔzhùáizhèngdēngjìzhībìnglǐbàogàozīxùnxiéqǔxìtǒng AT húhuìzhēn fǔzhùáizhèngdēngjìzhībìnglǐbàogàozīxùnxiéqǔxìtǒng
_version_	1717758517418393600

Information Extraction from Pathology Report for Cancer Registry

Similar Items