Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study

BackgroundMany public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithm...

Full description

Bibliographic Details
Main Authors:	Avoundjian, Tigran, Dombrowski, Julia C, Golden, Matthew R, Hughes, James P, Guthrie, Brandon L, Baseman, Janet, Sadinle, Mauricio
Format:	Article
Language:	English
Published:	JMIR Publications 2020-04-01
Series:	JMIR Public Health and Surveillance
Online Access:	http://publichealth.jmir.org/2020/2/e15917/

id	doaj-17f5fb4ad7a24231bb8baf6456fdb218
record_format	Article
spelling	doaj-17f5fb4ad7a24231bb8baf6456fdb2182021-05-02T19:35:20ZengJMIR PublicationsJMIR Public Health and Surveillance2369-29602020-04-0162e1591710.2196/15917Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation StudyAvoundjian, TigranDombrowski, Julia CGolden, Matthew RHughes, James PGuthrie, Brandon LBaseman, JanetSadinle, Mauricio BackgroundMany public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithms that may miss many true matches. In the context of public health action, these missed matches lead to missed opportunities to deliver interventions and may exacerbate existing health inequities. ObjectiveThis study aimed to compare the performance of record linkage algorithms commonly used in public health practice. MethodsWe compared five deterministic (exact, Stenger, Ocampo 1, Ocampo 2, and Bosh) and two probabilistic record linkage algorithms (fastLink and beta record linkage [BRL]) using simulations and a real-world scenario. We simulated pairs of datasets with varying numbers of errors per record and the number of matching records between the two datasets (ie, overlap). We matched the datasets using each algorithm and calculated their recall (ie, sensitivity, the proportion of true matches identified by the algorithm) and precision (ie, positive predictive value, the proportion of matches identified by the algorithm that were true matches). We estimated the average computation time by performing a match with each algorithm 20 times while varying the size of the datasets being matched. In a real-world scenario, HIV and sexually transmitted disease surveillance data from King County, Washington, were matched to identify people living with HIV who had a syphilis diagnosis in 2017. We calculated the recall and precision of each algorithm compared with a composite standard based on the agreement in matching decisions across all the algorithms and manual review. ResultsIn simulations, BRL and fastLink maintained a high recall at nearly all data quality levels, while being comparable with deterministic algorithms in terms of precision. Deterministic algorithms typically failed to identify matches in scenarios with low data quality. All the deterministic algorithms had a shorter average computation time than the probabilistic algorithms. BRL had the slowest overall computation time (14 min when both datasets contained 2000 records). In the real-world scenario, BRL had the lowest trade-off between recall (309/309, 100.0%) and precision (309/312, 99.0%). ConclusionsProbabilistic record linkage algorithms maximize the number of true matches identified, reducing gaps in the coverage of interventions and maximizing the reach of public health action.http://publichealth.jmir.org/2020/2/e15917/
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Avoundjian, Tigran Dombrowski, Julia C Golden, Matthew R Hughes, James P Guthrie, Brandon L Baseman, Janet Sadinle, Mauricio
spellingShingle	Avoundjian, Tigran Dombrowski, Julia C Golden, Matthew R Hughes, James P Guthrie, Brandon L Baseman, Janet Sadinle, Mauricio Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study JMIR Public Health and Surveillance
author_facet	Avoundjian, Tigran Dombrowski, Julia C Golden, Matthew R Hughes, James P Guthrie, Brandon L Baseman, Janet Sadinle, Mauricio
author_sort	Avoundjian, Tigran
title	Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title_short	Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title_full	Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title_fullStr	Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title_full_unstemmed	Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study
title_sort	comparing methods for record linkage for public health action: matching algorithm validation study
publisher	JMIR Publications
series	JMIR Public Health and Surveillance
issn	2369-2960
publishDate	2020-04-01
description	BackgroundMany public health departments use record linkage between surveillance data and external data sources to inform public health interventions. However, little guidance is available to inform these activities, and many health departments rely on deterministic algorithms that may miss many true matches. In the context of public health action, these missed matches lead to missed opportunities to deliver interventions and may exacerbate existing health inequities. ObjectiveThis study aimed to compare the performance of record linkage algorithms commonly used in public health practice. MethodsWe compared five deterministic (exact, Stenger, Ocampo 1, Ocampo 2, and Bosh) and two probabilistic record linkage algorithms (fastLink and beta record linkage [BRL]) using simulations and a real-world scenario. We simulated pairs of datasets with varying numbers of errors per record and the number of matching records between the two datasets (ie, overlap). We matched the datasets using each algorithm and calculated their recall (ie, sensitivity, the proportion of true matches identified by the algorithm) and precision (ie, positive predictive value, the proportion of matches identified by the algorithm that were true matches). We estimated the average computation time by performing a match with each algorithm 20 times while varying the size of the datasets being matched. In a real-world scenario, HIV and sexually transmitted disease surveillance data from King County, Washington, were matched to identify people living with HIV who had a syphilis diagnosis in 2017. We calculated the recall and precision of each algorithm compared with a composite standard based on the agreement in matching decisions across all the algorithms and manual review. ResultsIn simulations, BRL and fastLink maintained a high recall at nearly all data quality levels, while being comparable with deterministic algorithms in terms of precision. Deterministic algorithms typically failed to identify matches in scenarios with low data quality. All the deterministic algorithms had a shorter average computation time than the probabilistic algorithms. BRL had the slowest overall computation time (14 min when both datasets contained 2000 records). In the real-world scenario, BRL had the lowest trade-off between recall (309/309, 100.0%) and precision (309/312, 99.0%). ConclusionsProbabilistic record linkage algorithms maximize the number of true matches identified, reducing gaps in the coverage of interventions and maximizing the reach of public health action.
url	http://publichealth.jmir.org/2020/2/e15917/
work_keys_str_mv	AT avoundjiantigran comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy AT dombrowskijuliac comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy AT goldenmatthewr comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy AT hughesjamesp comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy AT guthriebrandonl comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy AT basemanjanet comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy AT sadinlemauricio comparingmethodsforrecordlinkageforpublichealthactionmatchingalgorithmvalidationstudy
_version_	1721488056963301376

Comparing Methods for Record Linkage for Public Health Action: Matching Algorithm Validation Study

Similar Items