Comparison of Three Information Sources for Smoking Information in Electronic Health Records

Objective The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Dis...

Full description

Bibliographic Details
Main Authors: Liwei Wang, Xiaoyang Ruan, Ping Yang, Hongfang Liu
Format: Article
Language:English
Published: SAGE Publishing 2016-01-01
Series:Cancer Informatics
Online Access:https://doi.org/10.4137/CIN.S40604
id doaj-3824bc799ad943499296a0c62fa64a79
record_format Article
spelling doaj-3824bc799ad943499296a0c62fa64a792020-11-25T03:34:05ZengSAGE PublishingCancer Informatics1176-93512016-01-011510.4137/CIN.S40604Comparison of Three Information Sources for Smoking Information in Electronic Health RecordsLiwei Wang0Xiaoyang Ruan1Ping Yang2Hongfang Liu3Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.Objective The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPL Materials and Methods Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). Results NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. Conclusion These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage.https://doi.org/10.4137/CIN.S40604
collection DOAJ
language English
format Article
sources DOAJ
author Liwei Wang
Xiaoyang Ruan
Ping Yang
Hongfang Liu
spellingShingle Liwei Wang
Xiaoyang Ruan
Ping Yang
Hongfang Liu
Comparison of Three Information Sources for Smoking Information in Electronic Health Records
Cancer Informatics
author_facet Liwei Wang
Xiaoyang Ruan
Ping Yang
Hongfang Liu
author_sort Liwei Wang
title Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title_short Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title_full Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title_fullStr Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title_full_unstemmed Comparison of Three Information Sources for Smoking Information in Electronic Health Records
title_sort comparison of three information sources for smoking information in electronic health records
publisher SAGE Publishing
series Cancer Informatics
issn 1176-9351
publishDate 2016-01-01
description Objective The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPL Materials and Methods Our study leveraged an existing lung cancer cohort for smoking status, amount, and strength information, which was manually chart-reviewed. On the NLP side, smoking-related electronic medical record (EMR) data were retrieved first. A pattern-based smoking information extraction module was then implemented to extract smoking-related information. After that, heuristic rules were used to obtain smoking status-related information. Smoking information was also obtained from structured data sources based on diagnosis codes and PPI. Sensitivity, specificity, and accuracy were measured using patients with coverage (ie, the proportion of patients whose smoking status/strength can be effectively determined). Results NLP alone has the best overall performance for smoking status extraction (patient coverage: 0.88; sensitivity: 0.97; specificity: 0.70; accuracy: 0.88); combining PPI with NLP further improved patient coverage to 0.96. ICD-9 does not provide additional improvement to NLP and its combination with PPI. For smoking strength, combining NLP with PPI has slight improvement over NLP alone. Conclusion These findings suggest that narrative text could serve as a more reliable and comprehensive source for obtaining smoking-related information than structured data sources. PPI, the readily available structured data, could be used as a complementary source for more comprehensive patient coverage.
url https://doi.org/10.4137/CIN.S40604
work_keys_str_mv AT liweiwang comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords
AT xiaoyangruan comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords
AT pingyang comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords
AT hongfangliu comparisonofthreeinformationsourcesforsmokinginformationinelectronichealthrecords
_version_ 1724560714640654336