Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies
Abstract Background Healthcare-associated infections (HAIs) represent a major Public Health issue. Hospital-based prevalence studies are a common tool of HAI surveillance, but data quality problems and non-representativeness can undermine their reliability. Methods This study proposes three algorith...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-04-01
|
Series: | BMC Medical Research Methodology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12874-021-01277-y |
id |
doaj-6fc555ebc4784c9b956bbf733cd44e89 |
---|---|
record_format |
Article |
spelling |
doaj-6fc555ebc4784c9b956bbf733cd44e892021-05-02T11:03:08ZengBMCBMC Medical Research Methodology1471-22882021-04-0121111410.1186/s12874-021-01277-yData quality assessment and subsampling strategies to correct distributional bias in prevalence studiesA. D’Ambrosio0J. Garlasco1F. Quattrocolo2C. Vicentini3C. M. Zotti4Department of Public Health and Paediatric Sciences, University of TurinDepartment of Public Health and Paediatric Sciences, University of TurinDepartment of Public Health and Paediatric Sciences, University of TurinDepartment of Public Health and Paediatric Sciences, University of TurinDepartment of Public Health and Paediatric Sciences, University of TurinAbstract Background Healthcare-associated infections (HAIs) represent a major Public Health issue. Hospital-based prevalence studies are a common tool of HAI surveillance, but data quality problems and non-representativeness can undermine their reliability. Methods This study proposes three algorithms that, given a convenience sample and variables relevant for the outcome of the study, select a subsample with specific distributional characteristics, boosting either representativeness (Probability and Distance procedures) or risk factors’ balance (Uniformity procedure). A “Quality Score” (QS) was also developed to grade sampled units according to data completeness and reliability. The methodologies were evaluated through bootstrapping on a convenience sample of 135 hospitals collected during the 2016 Italian Point Prevalence Survey (PPS) on HAIs. Results The QS highlighted wide variations in data quality among hospitals (median QS 52.9 points, range 7.98–628, lower meaning better quality), with most problems ascribable to ward and hospital-related data reporting. Both Distance and Probability procedures produced subsamples with lower distributional bias (Log-likelihood score increased from 7.3 to 29 points). The Uniformity procedure increased the homogeneity of the sample characteristics (e.g., − 58.4% in geographical variability). The procedures selected hospitals with higher data quality, especially the Probability procedure (lower QS in 100% of bootstrap simulations). The Distance procedure produced lower HAI prevalence estimates (6.98% compared to 7.44% in the convenience sample), more in line with the European median. Conclusions The QS and the subsampling procedures proposed in this study could represent effective tools to improve the quality of prevalence studies, decreasing the biases that can arise due to non-probabilistic sample collection.https://doi.org/10.1186/s12874-021-01277-yHealthcare associated infectionsPrevalence studiesSamplingData qualityMethodologyBias correction |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
A. D’Ambrosio J. Garlasco F. Quattrocolo C. Vicentini C. M. Zotti |
spellingShingle |
A. D’Ambrosio J. Garlasco F. Quattrocolo C. Vicentini C. M. Zotti Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies BMC Medical Research Methodology Healthcare associated infections Prevalence studies Sampling Data quality Methodology Bias correction |
author_facet |
A. D’Ambrosio J. Garlasco F. Quattrocolo C. Vicentini C. M. Zotti |
author_sort |
A. D’Ambrosio |
title |
Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies |
title_short |
Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies |
title_full |
Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies |
title_fullStr |
Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies |
title_full_unstemmed |
Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies |
title_sort |
data quality assessment and subsampling strategies to correct distributional bias in prevalence studies |
publisher |
BMC |
series |
BMC Medical Research Methodology |
issn |
1471-2288 |
publishDate |
2021-04-01 |
description |
Abstract Background Healthcare-associated infections (HAIs) represent a major Public Health issue. Hospital-based prevalence studies are a common tool of HAI surveillance, but data quality problems and non-representativeness can undermine their reliability. Methods This study proposes three algorithms that, given a convenience sample and variables relevant for the outcome of the study, select a subsample with specific distributional characteristics, boosting either representativeness (Probability and Distance procedures) or risk factors’ balance (Uniformity procedure). A “Quality Score” (QS) was also developed to grade sampled units according to data completeness and reliability. The methodologies were evaluated through bootstrapping on a convenience sample of 135 hospitals collected during the 2016 Italian Point Prevalence Survey (PPS) on HAIs. Results The QS highlighted wide variations in data quality among hospitals (median QS 52.9 points, range 7.98–628, lower meaning better quality), with most problems ascribable to ward and hospital-related data reporting. Both Distance and Probability procedures produced subsamples with lower distributional bias (Log-likelihood score increased from 7.3 to 29 points). The Uniformity procedure increased the homogeneity of the sample characteristics (e.g., − 58.4% in geographical variability). The procedures selected hospitals with higher data quality, especially the Probability procedure (lower QS in 100% of bootstrap simulations). The Distance procedure produced lower HAI prevalence estimates (6.98% compared to 7.44% in the convenience sample), more in line with the European median. Conclusions The QS and the subsampling procedures proposed in this study could represent effective tools to improve the quality of prevalence studies, decreasing the biases that can arise due to non-probabilistic sample collection. |
topic |
Healthcare associated infections Prevalence studies Sampling Data quality Methodology Bias correction |
url |
https://doi.org/10.1186/s12874-021-01277-y |
work_keys_str_mv |
AT adambrosio dataqualityassessmentandsubsamplingstrategiestocorrectdistributionalbiasinprevalencestudies AT jgarlasco dataqualityassessmentandsubsamplingstrategiestocorrectdistributionalbiasinprevalencestudies AT fquattrocolo dataqualityassessmentandsubsamplingstrategiestocorrectdistributionalbiasinprevalencestudies AT cvicentini dataqualityassessmentandsubsamplingstrategiestocorrectdistributionalbiasinprevalencestudies AT cmzotti dataqualityassessmentandsubsamplingstrategiestocorrectdistributionalbiasinprevalencestudies |
_version_ |
1721492719649423360 |