Summary: | Rheumatoid arthritis (RA) is a chronic disease characterized by an overactive immune system and joint inflammation. Population-based administrative health data (AHD) are widely used for RA outcomes research and surveillance. However, AHD may not completely capture all cases of RA in the population. Capture-recapture (CR) methods have been proposed to describe the completeness of AHD for estimating disease population size, but AHD may not conform to the assumptions that underlie CR models. A Monte Carlo simulation study was used to investigate the effects of violations of the assumptions for two-source CR methods: dependence between data sources and heterogeneity of capture probabilities. We compared the Chapman estimator and an estimator based on the multinomial logistic regression model (MLRM) to study relative bias (RB), coverage probability (CP) of 95% confidence intervals, width of 95% confidence intervals (WCI), and root-mean-square-error (RMSE) in prevalence estimates. The effects of misspecification of the MLRM were also investigated. In addition, the Chapman and MLRM estimators were used to estimate RA prevalence using AHD data from Saskatchewan, Canada. Population sizes were consistently underestimated for CR methods when the assumptions were violated. The estimated population size for both of the estimators did not differ substantially except for the RMSE values. Parameter estimates became biased when the MLRM model was misspecified, but there was little impact on population size estimates. In conclusion, CR methods are recommended to reduce bias in prevalence estimates based on AHDS. Because these methods may be sensitive to assumption violations, researchers should consider potential dependence between data sources. As well, sufficient overlap in the cases captured by each data source (e.g., 50% of the cases are captured by both data sources) or balanced capture probability in each data source is needed to effectively implement these methods. Researchers who estimate population size using CR methods in AHDs should favour the MLRM estimator over the Chapman estimator.
|