An instrumental variable approach to estimation of match probabilities or precision in linked data
Background with rationale While probabilistic linkage methods are ostensibly based on the probabilities of record pairs being matches (i.e. their marginal precision or positive predictive value), in practice they are used principally for ranking candidate links and fall short of supporting estimati...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2019-11-01
|
Series: | International Journal of Population Data Science |
Online Access: | https://ijpds.org/article/view/1258 |
id |
doaj-51c726cdc40943189496cd5a0952db55 |
---|---|
record_format |
Article |
spelling |
doaj-51c726cdc40943189496cd5a0952db552020-11-25T00:43:36ZengSwansea UniversityInternational Journal of Population Data Science2399-49082019-11-014310.23889/ijpds.v4i3.1258An instrumental variable approach to estimation of match probabilities or precision in linked dataJames Doidge0Intensive Care National Audit and Research Centre (ICNARC) Background with rationale While probabilistic linkage methods are ostensibly based on the probabilities of record pairs being matches (i.e. their marginal precision or positive predictive value), in practice they are used principally for ranking candidate links and fall short of supporting estimation of absolute probabilities. A few variations on Fellegi and Sunter’s framework have been proposed to better accommodate the dependencies that limit transformation of match weights into match probabilities, but there are almost no alternative frameworks for match probability estimation. Main Aim To explore the feasibility, accuracy and limitations of a novel instrumental variable approach to estimation of match probabilities for use in either probabilistic record linkage or evaluation of linkage error. Methods/Approach Using both simulated data and a gold standard (labelled) dataset derived from real-world linked data, I assessed the accuracy of match probability estimation for a range of potential instruments and compared results to estimates produced using conventional probabilistic techniques. Results The technique involves trading the potential value of one matching variable in discriminating between candidate links for improved estimation of match probabilities within groups of otherwise similar candidates. Analysis of simulated data confirmed the theoretical validity of the approach in supporting unbiased estimation of match probabilities despite dependencies between other matching variables. Analysis of real-world data demonstrated feasibility in terms of the availability of real-world instruments that provided sufficiently accurate estimation in groups of candidate links above a minimum size. Invalid instruments produced estimates that could be strongly biased. Conclusion These early results are promising but the general availability of valid instruments, their ‘affordability’ in terms of sacrificed discrimination, and means for identifying valid instruments remain unclear. However, this approach represents a new variety of tool for the data linker’s toolkit, which may provide a useful angle on an otherwise difficult-to-estimate parameter and have applications yet to be envisaged. https://ijpds.org/article/view/1258 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
James Doidge |
spellingShingle |
James Doidge An instrumental variable approach to estimation of match probabilities or precision in linked data International Journal of Population Data Science |
author_facet |
James Doidge |
author_sort |
James Doidge |
title |
An instrumental variable approach to estimation of match probabilities or precision in linked data |
title_short |
An instrumental variable approach to estimation of match probabilities or precision in linked data |
title_full |
An instrumental variable approach to estimation of match probabilities or precision in linked data |
title_fullStr |
An instrumental variable approach to estimation of match probabilities or precision in linked data |
title_full_unstemmed |
An instrumental variable approach to estimation of match probabilities or precision in linked data |
title_sort |
instrumental variable approach to estimation of match probabilities or precision in linked data |
publisher |
Swansea University |
series |
International Journal of Population Data Science |
issn |
2399-4908 |
publishDate |
2019-11-01 |
description |
Background with rationale
While probabilistic linkage methods are ostensibly based on the probabilities of record pairs being matches (i.e. their marginal precision or positive predictive value), in practice they are used principally for ranking candidate links and fall short of supporting estimation of absolute probabilities. A few variations on Fellegi and Sunter’s framework have been proposed to better accommodate the dependencies that limit transformation of match weights into match probabilities, but there are almost no alternative frameworks for match probability estimation.
Main Aim
To explore the feasibility, accuracy and limitations of a novel instrumental variable approach to estimation of match probabilities for use in either probabilistic record linkage or evaluation of linkage error.
Methods/Approach
Using both simulated data and a gold standard (labelled) dataset derived from real-world linked data, I assessed the accuracy of match probability estimation for a range of potential instruments and compared results to estimates produced using conventional probabilistic techniques.
Results
The technique involves trading the potential value of one matching variable in discriminating between candidate links for improved estimation of match probabilities within groups of otherwise similar candidates. Analysis of simulated data confirmed the theoretical validity of the approach in supporting unbiased estimation of match probabilities despite dependencies between other matching variables. Analysis of real-world data demonstrated feasibility in terms of the availability of real-world instruments that provided sufficiently accurate estimation in groups of candidate links above a minimum size. Invalid instruments produced estimates that could be strongly biased.
Conclusion
These early results are promising but the general availability of valid instruments, their ‘affordability’ in terms of sacrificed discrimination, and means for identifying valid instruments remain unclear. However, this approach represents a new variety of tool for the data linker’s toolkit, which may provide a useful angle on an otherwise difficult-to-estimate parameter and have applications yet to be envisaged.
|
url |
https://ijpds.org/article/view/1258 |
work_keys_str_mv |
AT jamesdoidge aninstrumentalvariableapproachtoestimationofmatchprobabilitiesorprecisioninlinkeddata AT jamesdoidge instrumentalvariableapproachtoestimationofmatchprobabilitiesorprecisioninlinkeddata |
_version_ |
1725277490356682752 |