Assessing the impact of different grouping methods: time to rethink and regroup?
ABSTRACT Objectives The grouping of record-pairs to determine which administrative records belong to the same individual is an important process in record linkage. A variety of grouping methods are used but the relative benefits of each are unknown. We evaluate a number of grouping methods agains...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2017-04-01
|
Series: | International Journal of Population Data Science |
Online Access: | https://ijpds.org/article/view/155 |
id |
doaj-77f32a4e2556451d95ac6faf42c624da |
---|---|
record_format |
Article |
spelling |
doaj-77f32a4e2556451d95ac6faf42c624da2020-11-24T23:32:46ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.155155Assessing the impact of different grouping methods: time to rethink and regroup?Sean Randall0Anna Ferrante1Adrian Brown2James Boyd3James Semmens4Curtin UniversityCurtin UniversityCurtin UniversityCurtin UniversityCurtin UniversityABSTRACT Objectives The grouping of record-pairs to determine which administrative records belong to the same individual is an important process in record linkage. A variety of grouping methods are used but the relative benefits of each are unknown. We evaluate a number of grouping methods against the traditional merge based clustering approach using large scale administrative data. Approach The research aimed to both describe current grouping techniques used for record linkage, and to evaluate the most appropriate grouping method for specific circumstances. A range of grouping strategies were applied to three datasets with known truth sets. Conditions were simulated to appropriately investigate one-to-one, many-to-one and ongoing linkage scenarios. Results Results suggest alternate grouping methods will yield large benefits in linkage quality, especially when the quality of the underlying repository is high. Stepwise grouping methods were clearly superior for one-to-one linkage. There appeared little difference in linkage quality between many-to-one grouping approaches. The most appropriate techniques for ongoing linkage depended on the quality of the population spine and the underlying dataset. Conclusions These results demonstrate the large effect that the choice of grouping strategy can have on overall linkage quality. Ongoing linkages to high quality population spines provide large improvements in linkage quality compared to merge based linkages. Procuring or developing such a population spine will provide high linkage quality at far lower cost than current methods for improving linkage quality. By improving linkage quality at low cost, this resource can be further utilised by health researchers.https://ijpds.org/article/view/155 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Sean Randall Anna Ferrante Adrian Brown James Boyd James Semmens |
spellingShingle |
Sean Randall Anna Ferrante Adrian Brown James Boyd James Semmens Assessing the impact of different grouping methods: time to rethink and regroup? International Journal of Population Data Science |
author_facet |
Sean Randall Anna Ferrante Adrian Brown James Boyd James Semmens |
author_sort |
Sean Randall |
title |
Assessing the impact of different grouping methods: time to rethink and regroup? |
title_short |
Assessing the impact of different grouping methods: time to rethink and regroup? |
title_full |
Assessing the impact of different grouping methods: time to rethink and regroup? |
title_fullStr |
Assessing the impact of different grouping methods: time to rethink and regroup? |
title_full_unstemmed |
Assessing the impact of different grouping methods: time to rethink and regroup? |
title_sort |
assessing the impact of different grouping methods: time to rethink and regroup? |
publisher |
Swansea University |
series |
International Journal of Population Data Science |
issn |
2399-4908 |
publishDate |
2017-04-01 |
description |
ABSTRACT
Objectives
The grouping of record-pairs to determine which administrative records belong to the same individual is an important process in record linkage. A variety of grouping methods are used but the relative benefits of each are unknown. We evaluate a number of grouping methods against the traditional merge based clustering approach using large scale administrative data.
Approach
The research aimed to both describe current grouping techniques used for record linkage, and to evaluate the most appropriate grouping method for specific circumstances. A range of grouping strategies were applied to three datasets with known truth sets. Conditions were simulated to appropriately investigate one-to-one, many-to-one and ongoing linkage scenarios.
Results
Results suggest alternate grouping methods will yield large benefits in linkage quality, especially when the quality of the underlying repository is high. Stepwise grouping methods were clearly superior for one-to-one linkage. There appeared little difference in linkage quality between many-to-one grouping approaches. The most appropriate techniques for ongoing linkage depended on the quality of the population spine and the underlying dataset.
Conclusions
These results demonstrate the large effect that the choice of grouping strategy can have on overall linkage quality. Ongoing linkages to high quality population spines provide large improvements in linkage quality compared to merge based linkages. Procuring or developing such a population spine will provide high linkage quality at far lower cost than current methods for improving linkage quality. By improving linkage quality at low cost, this resource can be further utilised by health researchers. |
url |
https://ijpds.org/article/view/155 |
work_keys_str_mv |
AT seanrandall assessingtheimpactofdifferentgroupingmethodstimetorethinkandregroup AT annaferrante assessingtheimpactofdifferentgroupingmethodstimetorethinkandregroup AT adrianbrown assessingtheimpactofdifferentgroupingmethodstimetorethinkandregroup AT jamesboyd assessingtheimpactofdifferentgroupingmethodstimetorethinkandregroup AT jamessemmens assessingtheimpactofdifferentgroupingmethodstimetorethinkandregroup |
_version_ |
1725533288018214912 |