Information Extraction from Pathology Report for Cancer Registry
碩士 === 慈濟大學 === 醫學資訊研究所 === 97 === Cancer registry, which is important for medical research, is the process to extract important information from cancer patient’s medical report and code it into structured column. Because the major part of medical report is written in unstructured free-text, it’s us...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Online Access: | http://ndltd.ncl.edu.tw/handle/29591764407399847700 |
id |
ndltd-TW-097TCU05674001 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-097TCU056740012015-10-13T14:49:20Z http://ndltd.ncl.edu.tw/handle/29591764407399847700 Information Extraction from Pathology Report for Cancer Registry 輔助癌症登記之病理報告資訊擷取系統 Hui-Chen Hu 胡惠珍 碩士 慈濟大學 醫學資訊研究所 97 Cancer registry, which is important for medical research, is the process to extract important information from cancer patient’s medical report and code it into structured column. Because the major part of medical report is written in unstructured free-text, it’s usually coded manually by trained personnel according to some rules, which is time consuming. In this study, we developed a system to extract information from pathology report for cancer registry. We use MedLEE, a natural language processor, to convert a preprocessed pathology report into semi-structured, coded XML format. We constructed a “cancer registry rule management platform” to help cancer registrars coding their knowledge into registry rules, speed up the establishment of rules, and advance the quality of rules. After the user keyin new registry rule on the platform, the “Rule-XQuery converter” converts the rule into XQuery. After some modification by programmers, the XQuery could perform good accuracy. For a few variables which can’t be extracted by XQuery, we established a “Java information extraction module” to extract their values from pathology report by other methods such as keyword search. By using the converter and Java module, our system can quickly establish extraction rules for another cancer. After the extraction rules are established, the “post-processor” uses the XQueries and Java programs to extract the values of variables from the XML file or the pathology report. Furthermore, we provide a “registry validation interface” for users to conveniently validate the correctness of the extracted results. During validating, if the user finds there are mistakes in the rule, he/she could go back to the “cancer registry rule management platform” to modify the rule. In this study, we established three cancer registry rule sets for oral cavity carcinoma, hepatocellular carcinoma, and colorectal carcinoma. We compare the extracted results of our system with the manually coded cancer registries. The XQuery of oral cavity carcinoma is written by one programmer. The rules of hepatocellular carcinoma and colorectal carcinoma were established using the “cancer registry rule management platform”. After some modification by the programmer, the rules perform good accuracy rate, i.e. 93.15%, 92.11% and 90.98%, respectively. One registrar who used our system reported that she is satisfied with the performance of our system. The extracted cancer registry could serve as a “draft’ for final registry and save cancer registrars’ time and effort. Wen-Cheng Lin 林紋正 學位論文 ; thesis 51 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 慈濟大學 === 醫學資訊研究所 === 97 === Cancer registry, which is important for medical research, is the process to extract important information from cancer patient’s medical report and code it into structured column. Because the major part of medical report is written in unstructured free-text, it’s usually coded manually by trained personnel according to some rules, which is time consuming. In this study, we developed a system to extract information from pathology report for cancer registry.
We use MedLEE, a natural language processor, to convert a preprocessed pathology report into semi-structured, coded XML format. We constructed a “cancer registry rule management platform” to help cancer registrars coding their knowledge into registry rules, speed up the establishment of rules, and advance the quality of rules. After the user keyin new registry rule on the platform, the “Rule-XQuery converter” converts the rule into XQuery. After some modification by programmers, the XQuery could perform good accuracy. For a few variables which can’t be extracted by XQuery, we established a “Java information extraction module” to extract their values from pathology report by other methods such as keyword search. By using the converter and Java module, our system can quickly establish extraction rules for another cancer. After the extraction rules are established, the “post-processor” uses the XQueries and Java programs to extract the values of variables from the XML file or the pathology report. Furthermore, we provide a “registry validation interface” for users to conveniently validate the correctness of the extracted results. During validating, if the user finds there are mistakes in the rule, he/she could go back to the “cancer registry rule management platform” to modify the rule.
In this study, we established three cancer registry rule sets for oral cavity carcinoma, hepatocellular carcinoma, and colorectal carcinoma. We compare the extracted results of our system with the manually coded cancer registries. The XQuery of oral cavity carcinoma is written by one programmer. The rules of hepatocellular carcinoma and colorectal carcinoma were established using the “cancer registry rule management platform”. After some modification by the programmer, the rules perform good accuracy rate, i.e. 93.15%, 92.11% and 90.98%, respectively. One registrar who used our system reported that she is satisfied with the performance of our system. The extracted cancer registry could serve as a “draft’ for final registry and save cancer registrars’ time and effort.
|
author2 |
Wen-Cheng Lin |
author_facet |
Wen-Cheng Lin Hui-Chen Hu 胡惠珍 |
author |
Hui-Chen Hu 胡惠珍 |
spellingShingle |
Hui-Chen Hu 胡惠珍 Information Extraction from Pathology Report for Cancer Registry |
author_sort |
Hui-Chen Hu |
title |
Information Extraction from Pathology Report for Cancer Registry |
title_short |
Information Extraction from Pathology Report for Cancer Registry |
title_full |
Information Extraction from Pathology Report for Cancer Registry |
title_fullStr |
Information Extraction from Pathology Report for Cancer Registry |
title_full_unstemmed |
Information Extraction from Pathology Report for Cancer Registry |
title_sort |
information extraction from pathology report for cancer registry |
url |
http://ndltd.ncl.edu.tw/handle/29591764407399847700 |
work_keys_str_mv |
AT huichenhu informationextractionfrompathologyreportforcancerregistry AT húhuìzhēn informationextractionfrompathologyreportforcancerregistry AT huichenhu fǔzhùáizhèngdēngjìzhībìnglǐbàogàozīxùnxiéqǔxìtǒng AT húhuìzhēn fǔzhùáizhèngdēngjìzhībìnglǐbàogàozīxùnxiéqǔxìtǒng |
_version_ |
1717758517418393600 |