Next Generation Phenotyping Using the Unified Medical Language System

BackgroundStructured information within patient medical records represents a largely untapped treasure trove of research data. In the United States, privacy issues notwithstanding, this has recently become more accessible thanks to the increasing adoption of electronic health...

Full description

Bibliographic Details
Main Authors: Adamusiak, Tomasz, Shimoyama, Naoki, Shimoyama, Mary
Format: Article
Language:English
Published: JMIR Publications 2014-03-01
Series:JMIR Medical Informatics
Online Access:http://medinform.jmir.org/2014/1/e5/
id doaj-f1d7314d96254a48a58a85453807e4e1
record_format Article
spelling doaj-f1d7314d96254a48a58a85453807e4e12021-05-03T01:41:16ZengJMIR PublicationsJMIR Medical Informatics2291-96942014-03-0121e510.2196/medinform.3172Next Generation Phenotyping Using the Unified Medical Language SystemAdamusiak, TomaszShimoyama, NaokiShimoyama, Mary BackgroundStructured information within patient medical records represents a largely untapped treasure trove of research data. In the United States, privacy issues notwithstanding, this has recently become more accessible thanks to the increasing adoption of electronic health records (EHR) and health care data standards fueled by the Meaningful Use legislation. The other side of the coin is that it is now becoming increasingly more difficult to navigate the profusion of many disparate clinical terminology standards, which often span millions of concepts. ObjectiveThe objective of our study was to develop a methodology for integrating large amounts of structured clinical information that is both terminology agnostic and able to capture heterogeneous clinical phenotypes including problems, procedures, medications, and clinical results (such as laboratory tests and clinical observations). In this context, we define phenotyping as the extraction of all clinically relevant features contained in the EHR. MethodsThe scope of the project was framed by the Common Meaningful Use (MU) Dataset terminology standards; the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), RxNorm, the Logical Observation Identifiers Names and Codes (LOINC), the Current Procedural Terminology (CPT), the Health care Common Procedure Coding System (HCPCS), the International Classification of Diseases Ninth Revision Clinical Modification (ICD-9-CM), and the International Classification of Diseases Tenth Revision Clinical Modification (ICD-10-CM). The Unified Medical Language System (UMLS) was used as a mapping layer among the MU ontologies. An extract, load, and transform approach separated original annotations in the EHR from the mapping process and allowed for continuous updates as the terminologies were updated. Additionally, we integrated all terminologies into a single UMLS derived ontology and further optimized it to make the relatively large concept graph manageable. ResultsThe initial evaluation was performed with simulated data from the Clinical Avatars project using 100,000 virtual patients undergoing a 90 day, genotype guided, warfarin dosing protocol. This dataset was annotated with standard MU terminologies, loaded, and transformed using the UMLS. We have deployed this methodology to scale in our in-house analytics platform using structured EHR data for 7931 patients (12 million clinical observations) treated at the Froedtert Hospital. A demonstration limited to Clinical Avatars data is available on the Internet using the credentials user “jmirdemo” and password “jmirdemo”. ConclusionsDespite its inherent complexity, the UMLS can serve as an effective interface terminology for many of the clinical data standards currently used in the health care domain.http://medinform.jmir.org/2014/1/e5/
collection DOAJ
language English
format Article
sources DOAJ
author Adamusiak, Tomasz
Shimoyama, Naoki
Shimoyama, Mary
spellingShingle Adamusiak, Tomasz
Shimoyama, Naoki
Shimoyama, Mary
Next Generation Phenotyping Using the Unified Medical Language System
JMIR Medical Informatics
author_facet Adamusiak, Tomasz
Shimoyama, Naoki
Shimoyama, Mary
author_sort Adamusiak, Tomasz
title Next Generation Phenotyping Using the Unified Medical Language System
title_short Next Generation Phenotyping Using the Unified Medical Language System
title_full Next Generation Phenotyping Using the Unified Medical Language System
title_fullStr Next Generation Phenotyping Using the Unified Medical Language System
title_full_unstemmed Next Generation Phenotyping Using the Unified Medical Language System
title_sort next generation phenotyping using the unified medical language system
publisher JMIR Publications
series JMIR Medical Informatics
issn 2291-9694
publishDate 2014-03-01
description BackgroundStructured information within patient medical records represents a largely untapped treasure trove of research data. In the United States, privacy issues notwithstanding, this has recently become more accessible thanks to the increasing adoption of electronic health records (EHR) and health care data standards fueled by the Meaningful Use legislation. The other side of the coin is that it is now becoming increasingly more difficult to navigate the profusion of many disparate clinical terminology standards, which often span millions of concepts. ObjectiveThe objective of our study was to develop a methodology for integrating large amounts of structured clinical information that is both terminology agnostic and able to capture heterogeneous clinical phenotypes including problems, procedures, medications, and clinical results (such as laboratory tests and clinical observations). In this context, we define phenotyping as the extraction of all clinically relevant features contained in the EHR. MethodsThe scope of the project was framed by the Common Meaningful Use (MU) Dataset terminology standards; the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), RxNorm, the Logical Observation Identifiers Names and Codes (LOINC), the Current Procedural Terminology (CPT), the Health care Common Procedure Coding System (HCPCS), the International Classification of Diseases Ninth Revision Clinical Modification (ICD-9-CM), and the International Classification of Diseases Tenth Revision Clinical Modification (ICD-10-CM). The Unified Medical Language System (UMLS) was used as a mapping layer among the MU ontologies. An extract, load, and transform approach separated original annotations in the EHR from the mapping process and allowed for continuous updates as the terminologies were updated. Additionally, we integrated all terminologies into a single UMLS derived ontology and further optimized it to make the relatively large concept graph manageable. ResultsThe initial evaluation was performed with simulated data from the Clinical Avatars project using 100,000 virtual patients undergoing a 90 day, genotype guided, warfarin dosing protocol. This dataset was annotated with standard MU terminologies, loaded, and transformed using the UMLS. We have deployed this methodology to scale in our in-house analytics platform using structured EHR data for 7931 patients (12 million clinical observations) treated at the Froedtert Hospital. A demonstration limited to Clinical Avatars data is available on the Internet using the credentials user “jmirdemo” and password “jmirdemo”. ConclusionsDespite its inherent complexity, the UMLS can serve as an effective interface terminology for many of the clinical data standards currently used in the health care domain.
url http://medinform.jmir.org/2014/1/e5/
work_keys_str_mv AT adamusiaktomasz nextgenerationphenotypingusingtheunifiedmedicallanguagesystem
AT shimoyamanaoki nextgenerationphenotypingusingtheunifiedmedicallanguagesystem
AT shimoyamamary nextgenerationphenotypingusingtheunifiedmedicallanguagesystem
_version_ 1721485759331958784