Preparing Pathology Data for Linkage
Introduction The Tasmanian Data Linkage Unit (TDLU) undertook a complex data linkage project in 2019 linking public and private pathology data to five disparate health datasets. Having linked pathology data previously, the unit was aware of the challenges it faced linking a large dataset covering a...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2020-12-01
|
Series: | International Journal of Population Data Science |
Online Access: | https://ijpds.org/article/view/1480 |
id |
doaj-9a7f76e9990f42df9952e3ddd1de4def |
---|---|
record_format |
Article |
spelling |
doaj-9a7f76e9990f42df9952e3ddd1de4def2021-02-10T16:43:00ZengSwansea UniversityInternational Journal of Population Data Science2399-49082020-12-015510.23889/ijpds.v5i5.1480Preparing Pathology Data for LinkageNadine Wiggins0Tim Albion1Brian Stokes2Matthew Jose3University of TasmaniaUniversity of TasmaniaUniversity of TasmaniaUniversity of Tasmania Introduction The Tasmanian Data Linkage Unit (TDLU) undertook a complex data linkage project in 2019 linking public and private pathology data to five disparate health datasets. Having linked pathology data previously, the unit was aware of the challenges it faced linking a large dataset covering a fourteen-year time span. The aim of this study was to use data-linkage to develop a Tasmanian dataset to quantify the burden and distribution of chronic kidney disease, including identifying barriers to dialysis treatment services. Objectives and Approach A cohort was selected from public and private providers of pathology services in Tasmania from 2004-2017 to support the establishment of a comprehensive researchable dataset. A linkage plan was developed that included detailed processes for cleaning and de-duplicating the pathology data prior to linkage. The larger private pathology dataset comprised 3.9 million records and data cleaning strategies were implemented. De-duplication created extensive clerical review and methods to reduce this work were devised and implemented as part of the linkage process. Results De-duplication based on exact matches reduced the size of the dataset from 3.9 million to just over 520,000 individuals. Internal linkage of the dataset resulted in approximately 47,000 ‘groups’ eligible for review. Structured Query Language (SQL) queries were constructed and the number of groups eligible for review decreased by 42%. Further analysis was conducted, which resulted in an appropriate ‘cut off’ threshold being determined for clerical review and an estimate of false positive links remaining was calculated. Conclusion / Implications Methods of reducing the amount of manual clerical review can be incorporated into a linkage design when there is a thorough understanding of the characteristics and content of the dataset to be linked. The methods used for this linkage project will be utilised for future projects using pathology data. https://ijpds.org/article/view/1480 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Nadine Wiggins Tim Albion Brian Stokes Matthew Jose |
spellingShingle |
Nadine Wiggins Tim Albion Brian Stokes Matthew Jose Preparing Pathology Data for Linkage International Journal of Population Data Science |
author_facet |
Nadine Wiggins Tim Albion Brian Stokes Matthew Jose |
author_sort |
Nadine Wiggins |
title |
Preparing Pathology Data for Linkage |
title_short |
Preparing Pathology Data for Linkage |
title_full |
Preparing Pathology Data for Linkage |
title_fullStr |
Preparing Pathology Data for Linkage |
title_full_unstemmed |
Preparing Pathology Data for Linkage |
title_sort |
preparing pathology data for linkage |
publisher |
Swansea University |
series |
International Journal of Population Data Science |
issn |
2399-4908 |
publishDate |
2020-12-01 |
description |
Introduction
The Tasmanian Data Linkage Unit (TDLU) undertook a complex data linkage project in 2019 linking public and private pathology data to five disparate health datasets. Having linked pathology data previously, the unit was aware of the challenges it faced linking a large dataset covering a fourteen-year time span. The aim of this study was to use data-linkage to develop a Tasmanian dataset to quantify the burden and distribution of chronic kidney disease, including identifying barriers to dialysis treatment services.
Objectives and Approach
A cohort was selected from public and private providers of pathology services in Tasmania from 2004-2017 to support the establishment of a comprehensive researchable dataset. A linkage plan was developed that included detailed processes for cleaning and de-duplicating the pathology data prior to linkage. The larger private pathology dataset comprised 3.9 million records and data cleaning strategies were implemented. De-duplication created extensive clerical review and methods to reduce this work were devised and implemented as part of the linkage process.
Results
De-duplication based on exact matches reduced the size of the dataset from 3.9 million to just over 520,000 individuals. Internal linkage of the dataset resulted in approximately 47,000 ‘groups’ eligible for review. Structured Query Language (SQL) queries were constructed and the number of groups eligible for review decreased by 42%. Further analysis was conducted, which resulted in an appropriate ‘cut off’ threshold being determined for clerical review and an estimate of false positive links remaining was calculated.
Conclusion / Implications
Methods of reducing the amount of manual clerical review can be incorporated into a linkage design when there is a thorough understanding of the characteristics and content of the dataset to be linked. The methods used for this linkage project will be utilised for future projects using pathology data.
|
url |
https://ijpds.org/article/view/1480 |
work_keys_str_mv |
AT nadinewiggins preparingpathologydataforlinkage AT timalbion preparingpathologydataforlinkage AT brianstokes preparingpathologydataforlinkage AT matthewjose preparingpathologydataforlinkage |
_version_ |
1724275190614982656 |