Comparison of Three Information Sources for Smoking Information in Electronic Health Records
Objective The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Dis...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2016-01-01
|
Series: | Cancer Informatics |
Online Access: | https://doi.org/10.4137/CIN.S40604 |
id |
doaj-3824bc799ad943499296a0c62fa64a79 |
---|---|
record_format |
Article |
spelling |
doaj-3824bc799ad943499296a0c62fa64a792020-11-25T03:34:05ZengSAGE PublishingCancer Informatics1176-93512016-01-011510.4137/CIN.S40604Comparison of Three Information Sources for Smoking Information in Electronic Health RecordsLiwei Wang0Xiaoyang Ruan1Ping Yang2Hongfang Liu3Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.Objective The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPL Materials and Methods Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). Results NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. Conclusion These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage.https://doi.org/10.4137/CIN.S40604 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Liwei Wang Xiaoyang Ruan Ping Yang Hongfang Liu |
spellingShingle |
Liwei Wang Xiaoyang Ruan Ping Yang Hongfang Liu Comparison of Three Information Sources for Smoking Information in Electronic Health Records Cancer Informatics |
author_facet |
Liwei Wang Xiaoyang Ruan Ping Yang Hongfang Liu |
author_sort |
Liwei Wang |
title |
Comparison of Three Information Sources for Smoking Information in Electronic Health Records |
title_short |
Comparison of Three Information Sources for Smoking Information in Electronic Health Records |
title_full |
Comparison of Three Information Sources for Smoking Information in Electronic Health Records |
title_fullStr |
Comparison of Three Information Sources for Smoking Information in Electronic Health Records |
title_full_unstemmed |
Comparison of Three Information Sources for Smoking Information in Electronic Health Records |
title_sort |
comparison of three information sources for smoking information in electronic health records |
publisher |
SAGE Publishing |
series |
Cancer Informatics |
issn |
1176-9351 |
publishDate |
2016-01-01 |
description |
Objective The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPL Materials and Methods Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). Results NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. Conclusion These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage. |
url |
https://doi.org/10.4137/CIN.S40604 |
work_keys_str_mv |
AT liweiwang comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords AT xiaoyangruan comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords AT pingyang comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords AT hongfangliu comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords |
_version_ |
1724560714640654336 |