The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
Abstract Background Research on prognostic prediction models frequently uses data from routine healthcare. However, potential misclassification of predictors when using such data may strongly affect the studied associations. There is no doubt that such misclassification could lead to the derivation...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2017-11-01
|
Series: | Diagnostic and Prognostic Research |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s41512-017-0018-x |
id |
doaj-c261a44af12b4a67beb3e91772f3aa2b |
---|---|
record_format |
Article |
spelling |
doaj-c261a44af12b4a67beb3e91772f3aa2b2020-11-24T21:39:12ZengBMCDiagnostic and Prognostic Research2397-75232017-11-01111910.1186/s41512-017-0018-xThe effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillationS. van Doorn0T. B. Brakenhoff1K. G. M. Moons2F. H. Rutten3A. W. Hoes4R. H. H. Groenwold5G. J. Geersing6Julius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtAbstract Background Research on prognostic prediction models frequently uses data from routine healthcare. However, potential misclassification of predictors when using such data may strongly affect the studied associations. There is no doubt that such misclassification could lead to the derivation of suboptimal prediction models. The extent to which misclassification affects the validation of existing prediction models is currently unclear. We aimed to quantify the amount of misclassification in routine care data and its effect on the validation of the existing risk prediction model. As an illustrative example, we validated the CHA2DS2-VASc prediction rule for predicting mortality in patients with atrial fibrillation (AF). Methods In a prospective cohort in general practice in the Netherlands, we used computerized retrieved data from the electronic medical records of patients known with AF as index predictors. Additionally, manually collected data after scrutinizing all complete medical files were used as reference predictors. Comparing the index with the reference predictors, we assessed misclassification in individual predictors by calculating Cohen’s kappas and other diagnostic test accuracy measures. Predictive performance was quantified by the c-statistic and by determining calibration of multivariable models. Results In total, 2363 AF patients were included. After a median follow-up of 2.7 (IQR 2.3–3.0) years, 368 patients died (incidence rate 6.2 deaths per 100 person-years). Misclassification in individual predictors ranged from substantial (Cohen’s kappa 0.56 for prior history of heart failure) to minor (kappa 0.90 for a history of type 2 diabetes). The overall model performance was not affected when using either index or reference predictors, with a c-statistic of 0.684 and 0.681, respectively, and similar calibration. Conclusion In a case study validating the CHA2DS2-VASc prediction model, we found substantial predictor misclassification in routine healthcare data with only limited effect on overall model performance. Our study should be repeated for other often applied prediction models to further evaluate the usefulness of routinely available healthcare data for validating prognostic models in the presence of predictor misclassification.http://link.springer.com/article/10.1186/s41512-017-0018-xRoutine care dataValidationPrediction modelAtrial fibrillationCHA2DS2-VAScMisclassification |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
S. van Doorn T. B. Brakenhoff K. G. M. Moons F. H. Rutten A. W. Hoes R. H. H. Groenwold G. J. Geersing |
spellingShingle |
S. van Doorn T. B. Brakenhoff K. G. M. Moons F. H. Rutten A. W. Hoes R. H. H. Groenwold G. J. Geersing The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation Diagnostic and Prognostic Research Routine care data Validation Prediction model Atrial fibrillation CHA2DS2-VASc Misclassification |
author_facet |
S. van Doorn T. B. Brakenhoff K. G. M. Moons F. H. Rutten A. W. Hoes R. H. H. Groenwold G. J. Geersing |
author_sort |
S. van Doorn |
title |
The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation |
title_short |
The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation |
title_full |
The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation |
title_fullStr |
The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation |
title_full_unstemmed |
The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation |
title_sort |
effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the cha2ds2-vasc score in atrial fibrillation |
publisher |
BMC |
series |
Diagnostic and Prognostic Research |
issn |
2397-7523 |
publishDate |
2017-11-01 |
description |
Abstract Background Research on prognostic prediction models frequently uses data from routine healthcare. However, potential misclassification of predictors when using such data may strongly affect the studied associations. There is no doubt that such misclassification could lead to the derivation of suboptimal prediction models. The extent to which misclassification affects the validation of existing prediction models is currently unclear. We aimed to quantify the amount of misclassification in routine care data and its effect on the validation of the existing risk prediction model. As an illustrative example, we validated the CHA2DS2-VASc prediction rule for predicting mortality in patients with atrial fibrillation (AF). Methods In a prospective cohort in general practice in the Netherlands, we used computerized retrieved data from the electronic medical records of patients known with AF as index predictors. Additionally, manually collected data after scrutinizing all complete medical files were used as reference predictors. Comparing the index with the reference predictors, we assessed misclassification in individual predictors by calculating Cohen’s kappas and other diagnostic test accuracy measures. Predictive performance was quantified by the c-statistic and by determining calibration of multivariable models. Results In total, 2363 AF patients were included. After a median follow-up of 2.7 (IQR 2.3–3.0) years, 368 patients died (incidence rate 6.2 deaths per 100 person-years). Misclassification in individual predictors ranged from substantial (Cohen’s kappa 0.56 for prior history of heart failure) to minor (kappa 0.90 for a history of type 2 diabetes). The overall model performance was not affected when using either index or reference predictors, with a c-statistic of 0.684 and 0.681, respectively, and similar calibration. Conclusion In a case study validating the CHA2DS2-VASc prediction model, we found substantial predictor misclassification in routine healthcare data with only limited effect on overall model performance. Our study should be repeated for other often applied prediction models to further evaluate the usefulness of routinely available healthcare data for validating prognostic models in the presence of predictor misclassification. |
topic |
Routine care data Validation Prediction model Atrial fibrillation CHA2DS2-VASc Misclassification |
url |
http://link.springer.com/article/10.1186/s41512-017-0018-x |
work_keys_str_mv |
AT svandoorn theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT tbbrakenhoff theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT kgmmoons theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT fhrutten theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT awhoes theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT rhhgroenwold theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT gjgeersing theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT svandoorn effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT tbbrakenhoff effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT kgmmoons effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT fhrutten effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT awhoes effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT rhhgroenwold effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation AT gjgeersing effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation |
_version_ |
1725931978185768960 |