Preparing Pathology Data for Linkage

Introduction The Tasmanian Data Linkage Unit (TDLU) undertook a complex data linkage project in 2019 linking public and private pathology data to five disparate health datasets. Having linked pathology data previously, the unit was aware of the challenges it faced linking a large dataset covering a...

Full description

Bibliographic Details
Main Authors: Nadine Wiggins, Tim Albion, Brian Stokes, Matthew Jose
Format: Article
Language:English
Published: Swansea University 2020-12-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/1480
id doaj-9a7f76e9990f42df9952e3ddd1de4def
record_format Article
spelling doaj-9a7f76e9990f42df9952e3ddd1de4def2021-02-10T16:43:00ZengSwansea UniversityInternational Journal of Population Data Science2399-49082020-12-015510.23889/ijpds.v5i5.1480Preparing Pathology Data for LinkageNadine Wiggins0Tim Albion1Brian Stokes2Matthew Jose3University of TasmaniaUniversity of TasmaniaUniversity of TasmaniaUniversity of Tasmania Introduction The Tasmanian Data Linkage Unit (TDLU) undertook a complex data linkage project in 2019 linking public and private pathology data to five disparate health datasets. Having linked pathology data previously, the unit was aware of the challenges it faced linking a large dataset covering a fourteen-year time span. The aim of this study was to use data-linkage to develop a Tasmanian dataset to quantify the burden and distribution of chronic kidney disease, including identifying barriers to dialysis treatment services. Objectives and Approach A cohort was selected from public and private providers of pathology services in Tasmania from 2004-2017 to support the establishment of a comprehensive researchable dataset. A linkage plan was developed that included detailed processes for cleaning and de-duplicating the pathology data prior to linkage. The larger private pathology dataset comprised 3.9 million records and data cleaning strategies were implemented. De-duplication created extensive clerical review and methods to reduce this work were devised and implemented as part of the linkage process. Results De-duplication based on exact matches reduced the size of the dataset from 3.9 million to just over 520,000 individuals. Internal linkage of the dataset resulted in approximately 47,000 ‘groups’ eligible for review. Structured Query Language (SQL) queries were constructed and the number of groups eligible for review decreased by 42%. Further analysis was conducted, which resulted in an appropriate ‘cut off’ threshold being determined for clerical review and an estimate of false positive links remaining was calculated. Conclusion / Implications Methods of reducing the amount of manual clerical review can be incorporated into a linkage design when there is a thorough understanding of the characteristics and content of the dataset to be linked. The methods used for this linkage project will be utilised for future projects using pathology data. https://ijpds.org/article/view/1480
collection DOAJ
language English
format Article
sources DOAJ
author Nadine Wiggins
Tim Albion
Brian Stokes
Matthew Jose
spellingShingle Nadine Wiggins
Tim Albion
Brian Stokes
Matthew Jose
Preparing Pathology Data for Linkage
International Journal of Population Data Science
author_facet Nadine Wiggins
Tim Albion
Brian Stokes
Matthew Jose
author_sort Nadine Wiggins
title Preparing Pathology Data for Linkage
title_short Preparing Pathology Data for Linkage
title_full Preparing Pathology Data for Linkage
title_fullStr Preparing Pathology Data for Linkage
title_full_unstemmed Preparing Pathology Data for Linkage
title_sort preparing pathology data for linkage
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2020-12-01
description Introduction The Tasmanian Data Linkage Unit (TDLU) undertook a complex data linkage project in 2019 linking public and private pathology data to five disparate health datasets. Having linked pathology data previously, the unit was aware of the challenges it faced linking a large dataset covering a fourteen-year time span. The aim of this study was to use data-linkage to develop a Tasmanian dataset to quantify the burden and distribution of chronic kidney disease, including identifying barriers to dialysis treatment services. Objectives and Approach A cohort was selected from public and private providers of pathology services in Tasmania from 2004-2017 to support the establishment of a comprehensive researchable dataset. A linkage plan was developed that included detailed processes for cleaning and de-duplicating the pathology data prior to linkage. The larger private pathology dataset comprised 3.9 million records and data cleaning strategies were implemented. De-duplication created extensive clerical review and methods to reduce this work were devised and implemented as part of the linkage process. Results De-duplication based on exact matches reduced the size of the dataset from 3.9 million to just over 520,000 individuals. Internal linkage of the dataset resulted in approximately 47,000 ‘groups’ eligible for review. Structured Query Language (SQL) queries were constructed and the number of groups eligible for review decreased by 42%. Further analysis was conducted, which resulted in an appropriate ‘cut off’ threshold being determined for clerical review and an estimate of false positive links remaining was calculated. Conclusion / Implications Methods of reducing the amount of manual clerical review can be incorporated into a linkage design when there is a thorough understanding of the characteristics and content of the dataset to be linked. The methods used for this linkage project will be utilised for future projects using pathology data.
url https://ijpds.org/article/view/1480
work_keys_str_mv AT nadinewiggins preparingpathologydataforlinkage
AT timalbion preparingpathologydataforlinkage
AT brianstokes preparingpathologydataforlinkage
AT matthewjose preparingpathologydataforlinkage
_version_ 1724275190614982656