An instrumental variable approach to estimation of match probabilities or precision in linked data

Background with rationale While probabilistic linkage methods are ostensibly based on the probabilities of record pairs being matches (i.e. their marginal precision or positive predictive value), in practice they are used principally for ranking candidate links and fall short of supporting estimati...

Full description

Bibliographic Details
Main Author: James Doidge
Format: Article
Language:English
Published: Swansea University 2019-11-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/1258
id doaj-51c726cdc40943189496cd5a0952db55
record_format Article
spelling doaj-51c726cdc40943189496cd5a0952db552020-11-25T00:43:36ZengSwansea UniversityInternational Journal of Population Data Science2399-49082019-11-014310.23889/ijpds.v4i3.1258An instrumental variable approach to estimation of match probabilities or precision in linked dataJames Doidge0Intensive Care National Audit and Research Centre (ICNARC) Background with rationale While probabilistic linkage methods are ostensibly based on the probabilities of record pairs being matches (i.e. their marginal precision or positive predictive value), in practice they are used principally for ranking candidate links and fall short of supporting estimation of absolute probabilities. A few variations on Fellegi and Sunter’s framework have been proposed to better accommodate the dependencies that limit transformation of match weights into match probabilities, but there are almost no alternative frameworks for match probability estimation. Main Aim To explore the feasibility, accuracy and limitations of a novel instrumental variable approach to estimation of match probabilities for use in either probabilistic record linkage or evaluation of linkage error. Methods/Approach Using both simulated data and a gold standard (labelled) dataset derived from real-world linked data, I assessed the accuracy of match probability estimation for a range of potential instruments and compared results to estimates produced using conventional probabilistic techniques. Results The technique involves trading the potential value of one matching variable in discriminating between candidate links for improved estimation of match probabilities within groups of otherwise similar candidates. Analysis of simulated data confirmed the theoretical validity of the approach in supporting unbiased estimation of match probabilities despite dependencies between other matching variables. Analysis of real-world data demonstrated feasibility in terms of the availability of real-world instruments that provided sufficiently accurate estimation in groups of candidate links above a minimum size. Invalid instruments produced estimates that could be strongly biased. Conclusion These early results are promising but the general availability of valid instruments, their ‘affordability’ in terms of sacrificed discrimination, and means for identifying valid instruments remain unclear. However, this approach represents a new variety of tool for the data linker’s toolkit, which may provide a useful angle on an otherwise difficult-to-estimate parameter and have applications yet to be envisaged. https://ijpds.org/article/view/1258
collection DOAJ
language English
format Article
sources DOAJ
author James Doidge
spellingShingle James Doidge
An instrumental variable approach to estimation of match probabilities or precision in linked data
International Journal of Population Data Science
author_facet James Doidge
author_sort James Doidge
title An instrumental variable approach to estimation of match probabilities or precision in linked data
title_short An instrumental variable approach to estimation of match probabilities or precision in linked data
title_full An instrumental variable approach to estimation of match probabilities or precision in linked data
title_fullStr An instrumental variable approach to estimation of match probabilities or precision in linked data
title_full_unstemmed An instrumental variable approach to estimation of match probabilities or precision in linked data
title_sort instrumental variable approach to estimation of match probabilities or precision in linked data
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2019-11-01
description Background with rationale While probabilistic linkage methods are ostensibly based on the probabilities of record pairs being matches (i.e. their marginal precision or positive predictive value), in practice they are used principally for ranking candidate links and fall short of supporting estimation of absolute probabilities. A few variations on Fellegi and Sunter’s framework have been proposed to better accommodate the dependencies that limit transformation of match weights into match probabilities, but there are almost no alternative frameworks for match probability estimation. Main Aim To explore the feasibility, accuracy and limitations of a novel instrumental variable approach to estimation of match probabilities for use in either probabilistic record linkage or evaluation of linkage error. Methods/Approach Using both simulated data and a gold standard (labelled) dataset derived from real-world linked data, I assessed the accuracy of match probability estimation for a range of potential instruments and compared results to estimates produced using conventional probabilistic techniques. Results The technique involves trading the potential value of one matching variable in discriminating between candidate links for improved estimation of match probabilities within groups of otherwise similar candidates. Analysis of simulated data confirmed the theoretical validity of the approach in supporting unbiased estimation of match probabilities despite dependencies between other matching variables. Analysis of real-world data demonstrated feasibility in terms of the availability of real-world instruments that provided sufficiently accurate estimation in groups of candidate links above a minimum size. Invalid instruments produced estimates that could be strongly biased. Conclusion These early results are promising but the general availability of valid instruments, their ‘affordability’ in terms of sacrificed discrimination, and means for identifying valid instruments remain unclear. However, this approach represents a new variety of tool for the data linker’s toolkit, which may provide a useful angle on an otherwise difficult-to-estimate parameter and have applications yet to be envisaged.
url https://ijpds.org/article/view/1258
work_keys_str_mv AT jamesdoidge aninstrumentalvariableapproachtoestimationofmatchprobabilitiesorprecisioninlinkeddata
AT jamesdoidge instrumentalvariableapproachtoestimationofmatchprobabilitiesorprecisioninlinkeddata
_version_ 1725277490356682752