Preparing Pathology Data for Linkage

Introduction The Tasmanian Data Linkage Unit (TDLU) undertook a complex data linkage project in 2019 linking public and private pathology data to five disparate health datasets. Having linked pathology data previously, the unit was aware of the challenges it faced linking a large dataset covering a...

Full description

Bibliographic Details
Main Authors: Nadine Wiggins, Tim Albion, Brian Stokes, Matthew Jose
Format: Article
Language:English
Published: Swansea University 2020-12-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/1480
Description
Summary:Introduction The Tasmanian Data Linkage Unit (TDLU) undertook a complex data linkage project in 2019 linking public and private pathology data to five disparate health datasets. Having linked pathology data previously, the unit was aware of the challenges it faced linking a large dataset covering a fourteen-year time span. The aim of this study was to use data-linkage to develop a Tasmanian dataset to quantify the burden and distribution of chronic kidney disease, including identifying barriers to dialysis treatment services. Objectives and Approach A cohort was selected from public and private providers of pathology services in Tasmania from 2004-2017 to support the establishment of a comprehensive researchable dataset. A linkage plan was developed that included detailed processes for cleaning and de-duplicating the pathology data prior to linkage. The larger private pathology dataset comprised 3.9 million records and data cleaning strategies were implemented. De-duplication created extensive clerical review and methods to reduce this work were devised and implemented as part of the linkage process. Results De-duplication based on exact matches reduced the size of the dataset from 3.9 million to just over 520,000 individuals. Internal linkage of the dataset resulted in approximately 47,000 ‘groups’ eligible for review. Structured Query Language (SQL) queries were constructed and the number of groups eligible for review decreased by 42%. Further analysis was conducted, which resulted in an appropriate ‘cut off’ threshold being determined for clerical review and an estimate of false positive links remaining was calculated. Conclusion / Implications Methods of reducing the amount of manual clerical review can be incorporated into a linkage design when there is a thorough understanding of the characteristics and content of the dataset to be linked. The methods used for this linkage project will be utilised for future projects using pathology data.
ISSN:2399-4908