Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies

Abstract Background Healthcare-associated infections (HAIs) represent a major Public Health issue. Hospital-based prevalence studies are a common tool of HAI surveillance, but data quality problems and non-representativeness can undermine their reliability. Methods This study proposes three algorith...

Full description

Bibliographic Details
Main Authors: A. D’Ambrosio, J. Garlasco, F. Quattrocolo, C. Vicentini, C. M. Zotti
Format: Article
Language:English
Published: BMC 2021-04-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-021-01277-y
id doaj-6fc555ebc4784c9b956bbf733cd44e89
record_format Article
spelling doaj-6fc555ebc4784c9b956bbf733cd44e892021-05-02T11:03:08ZengBMCBMC Medical Research Methodology1471-22882021-04-0121111410.1186/s12874-021-01277-yData quality assessment and subsampling strategies to correct distributional bias in prevalence studiesA. D’Ambrosio0J. Garlasco1F. Quattrocolo2C. Vicentini3C. M. Zotti4Department of Public Health and Paediatric Sciences, University of TurinDepartment of Public Health and Paediatric Sciences, University of TurinDepartment of Public Health and Paediatric Sciences, University of TurinDepartment of Public Health and Paediatric Sciences, University of TurinDepartment of Public Health and Paediatric Sciences, University of TurinAbstract Background Healthcare-associated infections (HAIs) represent a major Public Health issue. Hospital-based prevalence studies are a common tool of HAI surveillance, but data quality problems and non-representativeness can undermine their reliability. Methods This study proposes three algorithms that, given a convenience sample and variables relevant for the outcome of the study, select a subsample with specific distributional characteristics, boosting either representativeness (Probability and Distance procedures) or risk factors’ balance (Uniformity procedure). A “Quality Score” (QS) was also developed to grade sampled units according to data completeness and reliability. The methodologies were evaluated through bootstrapping on a convenience sample of 135 hospitals collected during the 2016 Italian Point Prevalence Survey (PPS) on HAIs. Results The QS highlighted wide variations in data quality among hospitals (median QS 52.9 points, range 7.98–628, lower meaning better quality), with most problems ascribable to ward and hospital-related data reporting. Both Distance and Probability procedures produced subsamples with lower distributional bias (Log-likelihood score increased from 7.3 to 29 points). The Uniformity procedure increased the homogeneity of the sample characteristics (e.g., − 58.4% in geographical variability). The procedures selected hospitals with higher data quality, especially the Probability procedure (lower QS in 100% of bootstrap simulations). The Distance procedure produced lower HAI prevalence estimates (6.98% compared to 7.44% in the convenience sample), more in line with the European median. Conclusions The QS and the subsampling procedures proposed in this study could represent effective tools to improve the quality of prevalence studies, decreasing the biases that can arise due to non-probabilistic sample collection.https://doi.org/10.1186/s12874-021-01277-yHealthcare associated infectionsPrevalence studiesSamplingData qualityMethodologyBias correction
collection DOAJ
language English
format Article
sources DOAJ
author A. D’Ambrosio
J. Garlasco
F. Quattrocolo
C. Vicentini
C. M. Zotti
spellingShingle A. D’Ambrosio
J. Garlasco
F. Quattrocolo
C. Vicentini
C. M. Zotti
Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies
BMC Medical Research Methodology
Healthcare associated infections
Prevalence studies
Sampling
Data quality
Methodology
Bias correction
author_facet A. D’Ambrosio
J. Garlasco
F. Quattrocolo
C. Vicentini
C. M. Zotti
author_sort A. D’Ambrosio
title Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies
title_short Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies
title_full Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies
title_fullStr Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies
title_full_unstemmed Data quality assessment and subsampling strategies to correct distributional bias in prevalence studies
title_sort data quality assessment and subsampling strategies to correct distributional bias in prevalence studies
publisher BMC
series BMC Medical Research Methodology
issn 1471-2288
publishDate 2021-04-01
description Abstract Background Healthcare-associated infections (HAIs) represent a major Public Health issue. Hospital-based prevalence studies are a common tool of HAI surveillance, but data quality problems and non-representativeness can undermine their reliability. Methods This study proposes three algorithms that, given a convenience sample and variables relevant for the outcome of the study, select a subsample with specific distributional characteristics, boosting either representativeness (Probability and Distance procedures) or risk factors’ balance (Uniformity procedure). A “Quality Score” (QS) was also developed to grade sampled units according to data completeness and reliability. The methodologies were evaluated through bootstrapping on a convenience sample of 135 hospitals collected during the 2016 Italian Point Prevalence Survey (PPS) on HAIs. Results The QS highlighted wide variations in data quality among hospitals (median QS 52.9 points, range 7.98–628, lower meaning better quality), with most problems ascribable to ward and hospital-related data reporting. Both Distance and Probability procedures produced subsamples with lower distributional bias (Log-likelihood score increased from 7.3 to 29 points). The Uniformity procedure increased the homogeneity of the sample characteristics (e.g., − 58.4% in geographical variability). The procedures selected hospitals with higher data quality, especially the Probability procedure (lower QS in 100% of bootstrap simulations). The Distance procedure produced lower HAI prevalence estimates (6.98% compared to 7.44% in the convenience sample), more in line with the European median. Conclusions The QS and the subsampling procedures proposed in this study could represent effective tools to improve the quality of prevalence studies, decreasing the biases that can arise due to non-probabilistic sample collection.
topic Healthcare associated infections
Prevalence studies
Sampling
Data quality
Methodology
Bias correction
url https://doi.org/10.1186/s12874-021-01277-y
work_keys_str_mv AT adambrosio dataqualityassessmentandsubsamplingstrategiestocorrectdistributionalbiasinprevalencestudies
AT jgarlasco dataqualityassessmentandsubsamplingstrategiestocorrectdistributionalbiasinprevalencestudies
AT fquattrocolo dataqualityassessmentandsubsamplingstrategiestocorrectdistributionalbiasinprevalencestudies
AT cvicentini dataqualityassessmentandsubsamplingstrategiestocorrectdistributionalbiasinprevalencestudies
AT cmzotti dataqualityassessmentandsubsamplingstrategiestocorrectdistributionalbiasinprevalencestudies
_version_ 1721492719649423360