Displaying Linkage Success Statistics to Identify Systemic Errors
ABSTRACT Objective The primary objective is to create a method for displaying linkage statistics to researchers, data stewards, and linkage specialists in an informative and meaningful way. The method must visually display the linkage summary data and highlight drops in the linkage success rate....
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2017-04-01
|
Series: | International Journal of Population Data Science |
Online Access: | https://ijpds.org/article/view/194 |
id |
doaj-4859d8b57c434b4f8f095c2ce9c4b53d |
---|---|
record_format |
Article |
spelling |
doaj-4859d8b57c434b4f8f095c2ce9c4b53d2020-11-24T23:47:28ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.194194Displaying Linkage Success Statistics to Identify Systemic ErrorsMike Simpson0Harold Yip1Brent Hills2Population Data BCPopulation Data BCPopulation Data BCABSTRACT Objective The primary objective is to create a method for displaying linkage statistics to researchers, data stewards, and linkage specialists in an informative and meaningful way. The method must visually display the linkage summary data and highlight drops in the linkage success rate. Approach We created a web interface which shows linkage statistics by age and geography in calendar/service years. Each cell contains both the percentage of linked values along with the percentage of successfully linked data. The interface is filterable by gender, data-type, and whether to display the number of successful or unsuccessful linkages. Due to the high volume of data which will appear on the screen at one time, we use a heat map to highlight cells which have unusually high or low values. Totals are displayed with their own heat maps to compare easily years across ages group or age groups across years. We mask small cell sizes to preserve privacy. Results This approach allows people to easily spot drops in linkage success. If a particular year’s data or age group has a lower linkage rate than the rest of the dataset, the heat map can clearly highlight that discrepancy. Displaying the number of linkages along with the rate helps us determine if the sample size is playing a role in a low linkage success rate. Conclusion Data quality issues can silently cause linkage success rates to drop in certain years, geographies, age groups, or genders. Displaying linkage statistics on a single page with a heat map allows people to quickly spot inconsistencies in linkages.https://ijpds.org/article/view/194 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Mike Simpson Harold Yip Brent Hills |
spellingShingle |
Mike Simpson Harold Yip Brent Hills Displaying Linkage Success Statistics to Identify Systemic Errors International Journal of Population Data Science |
author_facet |
Mike Simpson Harold Yip Brent Hills |
author_sort |
Mike Simpson |
title |
Displaying Linkage Success Statistics to Identify Systemic Errors |
title_short |
Displaying Linkage Success Statistics to Identify Systemic Errors |
title_full |
Displaying Linkage Success Statistics to Identify Systemic Errors |
title_fullStr |
Displaying Linkage Success Statistics to Identify Systemic Errors |
title_full_unstemmed |
Displaying Linkage Success Statistics to Identify Systemic Errors |
title_sort |
displaying linkage success statistics to identify systemic errors |
publisher |
Swansea University |
series |
International Journal of Population Data Science |
issn |
2399-4908 |
publishDate |
2017-04-01 |
description |
ABSTRACT
Objective
The primary objective is to create a method for displaying linkage statistics to researchers, data stewards, and linkage specialists in an informative and meaningful way. The method must visually display the linkage summary data and highlight drops in the linkage success rate.
Approach
We created a web interface which shows linkage statistics by age and geography in calendar/service years. Each cell contains both the percentage of linked values along with the percentage of successfully linked data. The interface is filterable by gender, data-type, and whether to display the number of successful or unsuccessful linkages. Due to the high volume of data which will appear on the screen at one time, we use a heat map to highlight cells which have unusually high or low values. Totals are displayed with their own heat maps to compare easily years across ages group or age groups across years. We mask small cell sizes to preserve privacy.
Results
This approach allows people to easily spot drops in linkage success. If a particular year’s data or age group has a lower linkage rate than the rest of the dataset, the heat map can clearly highlight that discrepancy. Displaying the number of linkages along with the rate helps us determine if the sample size is playing a role in a low linkage success rate.
Conclusion
Data quality issues can silently cause linkage success rates to drop in certain years, geographies, age groups, or genders. Displaying linkage statistics on a single page with a heat map allows people to quickly spot inconsistencies in linkages. |
url |
https://ijpds.org/article/view/194 |
work_keys_str_mv |
AT mikesimpson displayinglinkagesuccessstatisticstoidentifysystemicerrors AT haroldyip displayinglinkagesuccessstatisticstoidentifysystemicerrors AT brenthills displayinglinkagesuccessstatisticstoidentifysystemicerrors |
_version_ |
1725489481458384896 |