The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation

Abstract Background Research on prognostic prediction models frequently uses data from routine healthcare. However, potential misclassification of predictors when using such data may strongly affect the studied associations. There is no doubt that such misclassification could lead to the derivation...

Full description

Bibliographic Details
Main Authors: S. van Doorn, T. B. Brakenhoff, K. G. M. Moons, F. H. Rutten, A. W. Hoes, R. H. H. Groenwold, G. J. Geersing
Format: Article
Language:English
Published: BMC 2017-11-01
Series:Diagnostic and Prognostic Research
Subjects:
Online Access:http://link.springer.com/article/10.1186/s41512-017-0018-x
id doaj-c261a44af12b4a67beb3e91772f3aa2b
record_format Article
spelling doaj-c261a44af12b4a67beb3e91772f3aa2b2020-11-24T21:39:12ZengBMCDiagnostic and Prognostic Research2397-75232017-11-01111910.1186/s41512-017-0018-xThe effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillationS. van Doorn0T. B. Brakenhoff1K. G. M. Moons2F. H. Rutten3A. W. Hoes4R. H. H. Groenwold5G. J. Geersing6Julius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtJulius Center for Health Sciences and Primary care, University Medical Center UtrechtAbstract Background Research on prognostic prediction models frequently uses data from routine healthcare. However, potential misclassification of predictors when using such data may strongly affect the studied associations. There is no doubt that such misclassification could lead to the derivation of suboptimal prediction models. The extent to which misclassification affects the validation of existing prediction models is currently unclear. We aimed to quantify the amount of misclassification in routine care data and its effect on the validation of the existing risk prediction model. As an illustrative example, we validated the CHA2DS2-VASc prediction rule for predicting mortality in patients with atrial fibrillation (AF). Methods In a prospective cohort in general practice in the Netherlands, we used computerized retrieved data from the electronic medical records of patients known with AF as index predictors. Additionally, manually collected data after scrutinizing all complete medical files were used as reference predictors. Comparing the index with the reference predictors, we assessed misclassification in individual predictors by calculating Cohen’s kappas and other diagnostic test accuracy measures. Predictive performance was quantified by the c-statistic and by determining calibration of multivariable models. Results In total, 2363 AF patients were included. After a median follow-up of 2.7 (IQR 2.3–3.0) years, 368 patients died (incidence rate 6.2 deaths per 100 person-years). Misclassification in individual predictors ranged from substantial (Cohen’s kappa 0.56 for prior history of heart failure) to minor (kappa 0.90 for a history of type 2 diabetes). The overall model performance was not affected when using either index or reference predictors, with a c-statistic of 0.684 and 0.681, respectively, and similar calibration. Conclusion In a case study validating the CHA2DS2-VASc prediction model, we found substantial predictor misclassification in routine healthcare data with only limited effect on overall model performance. Our study should be repeated for other often applied prediction models to further evaluate the usefulness of routinely available healthcare data for validating prognostic models in the presence of predictor misclassification.http://link.springer.com/article/10.1186/s41512-017-0018-xRoutine care dataValidationPrediction modelAtrial fibrillationCHA2DS2-VAScMisclassification
collection DOAJ
language English
format Article
sources DOAJ
author S. van Doorn
T. B. Brakenhoff
K. G. M. Moons
F. H. Rutten
A. W. Hoes
R. H. H. Groenwold
G. J. Geersing
spellingShingle S. van Doorn
T. B. Brakenhoff
K. G. M. Moons
F. H. Rutten
A. W. Hoes
R. H. H. Groenwold
G. J. Geersing
The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
Diagnostic and Prognostic Research
Routine care data
Validation
Prediction model
Atrial fibrillation
CHA2DS2-VASc
Misclassification
author_facet S. van Doorn
T. B. Brakenhoff
K. G. M. Moons
F. H. Rutten
A. W. Hoes
R. H. H. Groenwold
G. J. Geersing
author_sort S. van Doorn
title The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title_short The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title_full The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title_fullStr The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title_full_unstemmed The effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the CHA2DS2-VASc score in atrial fibrillation
title_sort effects of misclassification in routine healthcare databases on the accuracy of prognostic prediction models: a case study of the cha2ds2-vasc score in atrial fibrillation
publisher BMC
series Diagnostic and Prognostic Research
issn 2397-7523
publishDate 2017-11-01
description Abstract Background Research on prognostic prediction models frequently uses data from routine healthcare. However, potential misclassification of predictors when using such data may strongly affect the studied associations. There is no doubt that such misclassification could lead to the derivation of suboptimal prediction models. The extent to which misclassification affects the validation of existing prediction models is currently unclear. We aimed to quantify the amount of misclassification in routine care data and its effect on the validation of the existing risk prediction model. As an illustrative example, we validated the CHA2DS2-VASc prediction rule for predicting mortality in patients with atrial fibrillation (AF). Methods In a prospective cohort in general practice in the Netherlands, we used computerized retrieved data from the electronic medical records of patients known with AF as index predictors. Additionally, manually collected data after scrutinizing all complete medical files were used as reference predictors. Comparing the index with the reference predictors, we assessed misclassification in individual predictors by calculating Cohen’s kappas and other diagnostic test accuracy measures. Predictive performance was quantified by the c-statistic and by determining calibration of multivariable models. Results In total, 2363 AF patients were included. After a median follow-up of 2.7 (IQR 2.3–3.0) years, 368 patients died (incidence rate 6.2 deaths per 100 person-years). Misclassification in individual predictors ranged from substantial (Cohen’s kappa 0.56 for prior history of heart failure) to minor (kappa 0.90 for a history of type 2 diabetes). The overall model performance was not affected when using either index or reference predictors, with a c-statistic of 0.684 and 0.681, respectively, and similar calibration. Conclusion In a case study validating the CHA2DS2-VASc prediction model, we found substantial predictor misclassification in routine healthcare data with only limited effect on overall model performance. Our study should be repeated for other often applied prediction models to further evaluate the usefulness of routinely available healthcare data for validating prognostic models in the presence of predictor misclassification.
topic Routine care data
Validation
Prediction model
Atrial fibrillation
CHA2DS2-VASc
Misclassification
url http://link.springer.com/article/10.1186/s41512-017-0018-x
work_keys_str_mv AT svandoorn theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT tbbrakenhoff theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT kgmmoons theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT fhrutten theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT awhoes theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT rhhgroenwold theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT gjgeersing theeffectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT svandoorn effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT tbbrakenhoff effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT kgmmoons effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT fhrutten effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT awhoes effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT rhhgroenwold effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
AT gjgeersing effectsofmisclassificationinroutinehealthcaredatabasesontheaccuracyofprognosticpredictionmodelsacasestudyofthecha2ds2vascscoreinatrialfibrillation
_version_ 1725931978185768960