Assessing the impact of different grouping methods: time to rethink and regroup?

ABSTRACT Objectives The grouping of record-pairs to determine which administrative records belong to the same individual is an important process in record linkage. A variety of grouping methods are used but the relative benefits of each are unknown. We evaluate a number of grouping methods agains...

Full description

Bibliographic Details
Main Authors: Sean Randall, Anna Ferrante, Adrian Brown, James Boyd, James Semmens
Format: Article
Language:English
Published: Swansea University 2017-04-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/155
id doaj-77f32a4e2556451d95ac6faf42c624da
record_format Article
spelling doaj-77f32a4e2556451d95ac6faf42c624da2020-11-24T23:32:46ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.155155Assessing the impact of different grouping methods: time to rethink and regroup?Sean Randall0Anna Ferrante1Adrian Brown2James Boyd3James Semmens4Curtin UniversityCurtin UniversityCurtin UniversityCurtin UniversityCurtin UniversityABSTRACT Objectives The grouping of record-pairs to determine which administrative records belong to the same individual is an important process in record linkage. A variety of grouping methods are used but the relative benefits of each are unknown. We evaluate a number of grouping methods against the traditional merge based clustering approach using large scale administrative data. Approach The research aimed to both describe current grouping techniques used for record linkage, and to evaluate the most appropriate grouping method for specific circumstances. A range of grouping strategies were applied to three datasets with known truth sets. Conditions were simulated to appropriately investigate one-to-one, many-to-one and ongoing linkage scenarios. Results Results suggest alternate grouping methods will yield large benefits in linkage quality, especially when the quality of the underlying repository is high. Stepwise grouping methods were clearly superior for one-to-one linkage. There appeared little difference in linkage quality between many-to-one grouping approaches. The most appropriate techniques for ongoing linkage depended on the quality of the population spine and the underlying dataset. Conclusions These results demonstrate the large effect that the choice of grouping strategy can have on overall linkage quality. Ongoing linkages to high quality population spines provide large improvements in linkage quality compared to merge based linkages. Procuring or developing such a population spine will provide high linkage quality at far lower cost than current methods for improving linkage quality. By improving linkage quality at low cost, this resource can be further utilised by health researchers.https://ijpds.org/article/view/155
collection DOAJ
language English
format Article
sources DOAJ
author Sean Randall
Anna Ferrante
Adrian Brown
James Boyd
James Semmens
spellingShingle Sean Randall
Anna Ferrante
Adrian Brown
James Boyd
James Semmens
Assessing the impact of different grouping methods: time to rethink and regroup?
International Journal of Population Data Science
author_facet Sean Randall
Anna Ferrante
Adrian Brown
James Boyd
James Semmens
author_sort Sean Randall
title Assessing the impact of different grouping methods: time to rethink and regroup?
title_short Assessing the impact of different grouping methods: time to rethink and regroup?
title_full Assessing the impact of different grouping methods: time to rethink and regroup?
title_fullStr Assessing the impact of different grouping methods: time to rethink and regroup?
title_full_unstemmed Assessing the impact of different grouping methods: time to rethink and regroup?
title_sort assessing the impact of different grouping methods: time to rethink and regroup?
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2017-04-01
description ABSTRACT Objectives The grouping of record-pairs to determine which administrative records belong to the same individual is an important process in record linkage. A variety of grouping methods are used but the relative benefits of each are unknown. We evaluate a number of grouping methods against the traditional merge based clustering approach using large scale administrative data. Approach The research aimed to both describe current grouping techniques used for record linkage, and to evaluate the most appropriate grouping method for specific circumstances. A range of grouping strategies were applied to three datasets with known truth sets. Conditions were simulated to appropriately investigate one-to-one, many-to-one and ongoing linkage scenarios. Results Results suggest alternate grouping methods will yield large benefits in linkage quality, especially when the quality of the underlying repository is high. Stepwise grouping methods were clearly superior for one-to-one linkage. There appeared little difference in linkage quality between many-to-one grouping approaches. The most appropriate techniques for ongoing linkage depended on the quality of the population spine and the underlying dataset. Conclusions These results demonstrate the large effect that the choice of grouping strategy can have on overall linkage quality. Ongoing linkages to high quality population spines provide large improvements in linkage quality compared to merge based linkages. Procuring or developing such a population spine will provide high linkage quality at far lower cost than current methods for improving linkage quality. By improving linkage quality at low cost, this resource can be further utilised by health researchers.
url https://ijpds.org/article/view/155
work_keys_str_mv AT seanrandall assessingtheimpactofdifferentgroupingmethodstimetorethinkandregroup
AT annaferrante assessingtheimpactofdifferentgroupingmethodstimetorethinkandregroup
AT adrianbrown assessingtheimpactofdifferentgroupingmethodstimetorethinkandregroup
AT jamesboyd assessingtheimpactofdifferentgroupingmethodstimetorethinkandregroup
AT jamessemmens assessingtheimpactofdifferentgroupingmethodstimetorethinkandregroup
_version_ 1725533288018214912