Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification

Abstract Background Chronic Kidney Disease (CKD) is one of several conditions that affect a growing percentage of the US population; the disease is accompanied by multiple co-morbidities, and is hard to diagnose in-and-of itself. In its advanced forms it carries severe outcomes and can lead to death...

Full description

Bibliographic Details
Main Authors: Moumita Bhattacharya, Claudine Jurkovitz, Hagit Shatkay
Format: Article
Language:English
Published: BMC 2018-12-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-018-0675-x
id doaj-5aa811acd949448e83bc69f678afc0f7
record_format Article
spelling doaj-5aa811acd949448e83bc69f678afc0f72020-11-25T02:04:01ZengBMCBMC Medical Informatics and Decision Making1472-69472018-12-0118S4354410.1186/s12911-018-0675-xChronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classificationMoumita Bhattacharya0Claudine Jurkovitz1Hagit Shatkay2Computational Biomedicine Lab, Computer and Information Sciences, University of DelawareValue Institute, Christiana Care Health SystemComputational Biomedicine Lab, Computer and Information Sciences, University of DelawareAbstract Background Chronic Kidney Disease (CKD) is one of several conditions that affect a growing percentage of the US population; the disease is accompanied by multiple co-morbidities, and is hard to diagnose in-and-of itself. In its advanced forms it carries severe outcomes and can lead to death. It is thus important to detect the disease as early as possible, which can help devise effective intervention and treatment plan. Here we investigate ways to utilize information available in electronic health records (EHRs) from regular office visits of more than 13,000 patients, in order to distinguish among several stages of the disease. While clinical data stored in EHRs provide valuable information for risk-stratification, one of the major challenges in using them arises from data imbalance. That is, records associated with a more severe condition are typically under-represented compared to those associated with a milder manifestation of the disease. To address imbalance, we propose and develop a sampling-based ensemble approach, hierarchical meta-classification, aiming to stratify CKD patients into severity stages, using simple quantitative non-text features gathered from standard office visit records. Methods The proposed hierarchical meta-classification method frames the multiclass classification task as a hierarchy of two subtasks. The first is binary classification, separating records associated with the majority class from those associated with all minority classes combined, using meta-classification. The second subtask separates the records assigned to the combined minority classes into the individual constituent classes. Results The proposed method identifies a significant proportion of patients suffering from the more advanced stages of the condition, while also correctly identifying most of the less severe cases, maintaining high sensitivity, specificity and F-measure (≥ 93%). Our results show that the high level of performance attained by our method is preserved even when the size of the training set is significantly reduced, demonstrating the stability and generalizability of our approach. Conclusion We present a new approach to perform classification while addressing data imbalance, which is inherent in the biomedical domain. Our model effectively identifies severity stages of CKD patients, using information readily available in office visit records within the realistic context of high data imbalance.http://link.springer.com/article/10.1186/s12911-018-0675-xImbalanced dataMeta-classificationHierarchical classificationElectronic health recordsKidney disease
collection DOAJ
language English
format Article
sources DOAJ
author Moumita Bhattacharya
Claudine Jurkovitz
Hagit Shatkay
spellingShingle Moumita Bhattacharya
Claudine Jurkovitz
Hagit Shatkay
Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
BMC Medical Informatics and Decision Making
Imbalanced data
Meta-classification
Hierarchical classification
Electronic health records
Kidney disease
author_facet Moumita Bhattacharya
Claudine Jurkovitz
Hagit Shatkay
author_sort Moumita Bhattacharya
title Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title_short Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title_full Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title_fullStr Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title_full_unstemmed Chronic Kidney Disease stratification using office visit records: Handling data imbalance via hierarchical meta-classification
title_sort chronic kidney disease stratification using office visit records: handling data imbalance via hierarchical meta-classification
publisher BMC
series BMC Medical Informatics and Decision Making
issn 1472-6947
publishDate 2018-12-01
description Abstract Background Chronic Kidney Disease (CKD) is one of several conditions that affect a growing percentage of the US population; the disease is accompanied by multiple co-morbidities, and is hard to diagnose in-and-of itself. In its advanced forms it carries severe outcomes and can lead to death. It is thus important to detect the disease as early as possible, which can help devise effective intervention and treatment plan. Here we investigate ways to utilize information available in electronic health records (EHRs) from regular office visits of more than 13,000 patients, in order to distinguish among several stages of the disease. While clinical data stored in EHRs provide valuable information for risk-stratification, one of the major challenges in using them arises from data imbalance. That is, records associated with a more severe condition are typically under-represented compared to those associated with a milder manifestation of the disease. To address imbalance, we propose and develop a sampling-based ensemble approach, hierarchical meta-classification, aiming to stratify CKD patients into severity stages, using simple quantitative non-text features gathered from standard office visit records. Methods The proposed hierarchical meta-classification method frames the multiclass classification task as a hierarchy of two subtasks. The first is binary classification, separating records associated with the majority class from those associated with all minority classes combined, using meta-classification. The second subtask separates the records assigned to the combined minority classes into the individual constituent classes. Results The proposed method identifies a significant proportion of patients suffering from the more advanced stages of the condition, while also correctly identifying most of the less severe cases, maintaining high sensitivity, specificity and F-measure (≥ 93%). Our results show that the high level of performance attained by our method is preserved even when the size of the training set is significantly reduced, demonstrating the stability and generalizability of our approach. Conclusion We present a new approach to perform classification while addressing data imbalance, which is inherent in the biomedical domain. Our model effectively identifies severity stages of CKD patients, using information readily available in office visit records within the realistic context of high data imbalance.
topic Imbalanced data
Meta-classification
Hierarchical classification
Electronic health records
Kidney disease
url http://link.springer.com/article/10.1186/s12911-018-0675-x
work_keys_str_mv AT moumitabhattacharya chronickidneydiseasestratificationusingofficevisitrecordshandlingdataimbalanceviahierarchicalmetaclassification
AT claudinejurkovitz chronickidneydiseasestratificationusingofficevisitrecordshandlingdataimbalanceviahierarchicalmetaclassification
AT hagitshatkay chronickidneydiseasestratificationusingofficevisitrecordshandlingdataimbalanceviahierarchicalmetaclassification
_version_ 1724945143577968640