Displaying Linkage Success Statistics to Identify Systemic Errors

ABSTRACT Objective The primary objective is to create a method for displaying linkage statistics to researchers, data stewards, and linkage specialists in an informative and meaningful way. The method must visually display the linkage summary data and highlight drops in the linkage success rate....

Full description

Bibliographic Details
Main Authors: Mike Simpson, Harold Yip, Brent Hills
Format: Article
Language:English
Published: Swansea University 2017-04-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/194
id doaj-4859d8b57c434b4f8f095c2ce9c4b53d
record_format Article
spelling doaj-4859d8b57c434b4f8f095c2ce9c4b53d2020-11-24T23:47:28ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.194194Displaying Linkage Success Statistics to Identify Systemic ErrorsMike Simpson0Harold Yip1Brent Hills2Population Data BCPopulation Data BCPopulation Data BCABSTRACT Objective The primary objective is to create a method for displaying linkage statistics to researchers, data stewards, and linkage specialists in an informative and meaningful way. The method must visually display the linkage summary data and highlight drops in the linkage success rate. Approach We created a web interface which shows linkage statistics by age and geography in calendar/service years. Each cell contains both the percentage of linked values along with the percentage of successfully linked data. The interface is filterable by gender, data-type, and whether to display the number of successful or unsuccessful linkages. Due to the high volume of data which will appear on the screen at one time, we use a heat map to highlight cells which have unusually high or low values. Totals are displayed with their own heat maps to compare easily years across ages group or age groups across years. We mask small cell sizes to preserve privacy. Results This approach allows people to easily spot drops in linkage success. If a particular year’s data or age group has a lower linkage rate than the rest of the dataset, the heat map can clearly highlight that discrepancy. Displaying the number of linkages along with the rate helps us determine if the sample size is playing a role in a low linkage success rate. Conclusion Data quality issues can silently cause linkage success rates to drop in certain years, geographies, age groups, or genders. Displaying linkage statistics on a single page with a heat map allows people to quickly spot inconsistencies in linkages.https://ijpds.org/article/view/194
collection DOAJ
language English
format Article
sources DOAJ
author Mike Simpson
Harold Yip
Brent Hills
spellingShingle Mike Simpson
Harold Yip
Brent Hills
Displaying Linkage Success Statistics to Identify Systemic Errors
International Journal of Population Data Science
author_facet Mike Simpson
Harold Yip
Brent Hills
author_sort Mike Simpson
title Displaying Linkage Success Statistics to Identify Systemic Errors
title_short Displaying Linkage Success Statistics to Identify Systemic Errors
title_full Displaying Linkage Success Statistics to Identify Systemic Errors
title_fullStr Displaying Linkage Success Statistics to Identify Systemic Errors
title_full_unstemmed Displaying Linkage Success Statistics to Identify Systemic Errors
title_sort displaying linkage success statistics to identify systemic errors
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2017-04-01
description ABSTRACT Objective The primary objective is to create a method for displaying linkage statistics to researchers, data stewards, and linkage specialists in an informative and meaningful way. The method must visually display the linkage summary data and highlight drops in the linkage success rate. Approach We created a web interface which shows linkage statistics by age and geography in calendar/service years. Each cell contains both the percentage of linked values along with the percentage of successfully linked data. The interface is filterable by gender, data-type, and whether to display the number of successful or unsuccessful linkages. Due to the high volume of data which will appear on the screen at one time, we use a heat map to highlight cells which have unusually high or low values. Totals are displayed with their own heat maps to compare easily years across ages group or age groups across years. We mask small cell sizes to preserve privacy. Results This approach allows people to easily spot drops in linkage success. If a particular year’s data or age group has a lower linkage rate than the rest of the dataset, the heat map can clearly highlight that discrepancy. Displaying the number of linkages along with the rate helps us determine if the sample size is playing a role in a low linkage success rate. Conclusion Data quality issues can silently cause linkage success rates to drop in certain years, geographies, age groups, or genders. Displaying linkage statistics on a single page with a heat map allows people to quickly spot inconsistencies in linkages.
url https://ijpds.org/article/view/194
work_keys_str_mv AT mikesimpson displayinglinkagesuccessstatisticstoidentifysystemicerrors
AT haroldyip displayinglinkagesuccessstatisticstoidentifysystemicerrors
AT brenthills displayinglinkagesuccessstatisticstoidentifysystemicerrors
_version_ 1725489481458384896