Augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer

Abstract Background For epidemiological research, cancer registry datasets often need to be augmented with additional data. Data linkage is not feasible when there are no cases in common between data sets. We present a novel approach to augmenting cancer registry data by imputing pre-diagnosis healt...

Full description

Bibliographic Details
Main Authors: Paul P. Fahey, Andrew Page, Glenn Stone, Thomas Astell-Burt
Format: Article
Language:English
Published: BMC 2020-06-01
Series:BMC Cancer
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12885-020-06990-3
id doaj-07230846216647bfa411015381ffe813
record_format Article
spelling doaj-07230846216647bfa411015381ffe8132020-11-25T03:29:46ZengBMCBMC Cancer1471-24072020-06-0120111110.1186/s12885-020-06990-3Augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancerPaul P. Fahey0Andrew Page1Glenn Stone2Thomas Astell-Burt3School of Science and Health, Western Sydney UniversityTranslational Health Research Institute, Western Sydney UniversitySchool of Computing, Engineering and Mathematics, Western Sydney UniversityPopulation Wellbeing and Environment Research Lab (PowerLab), School of Health and Society, Faculty of Social Sciences, University of WollongongAbstract Background For epidemiological research, cancer registry datasets often need to be augmented with additional data. Data linkage is not feasible when there are no cases in common between data sets. We present a novel approach to augmenting cancer registry data by imputing pre-diagnosis health behaviour and estimating its relationship with post-diagnosis survival time. Methods Six measures of pre-diagnosis health behaviours (focussing on tobacco smoking, ‘at risk’ alcohol consumption, overweight and exercise) were imputed for 28,000 cancer registry data records of US oesophageal cancers using cold deck imputation from an unrelated health behaviour dataset. Each data point was imputed twice. This calibration allowed us to estimate the misclassification rate. We applied statistical correction for the misclassification to estimate the relative risk of dying within 1 year of diagnosis for each of the imputed behaviour variables. Subgroup analyses were conducted for adenocarcinoma and squamous cell carcinoma separately. Results Simulated survival data confirmed that accurate estimates of true relative risks could be retrieved for health behaviours with greater than 5% prevalence, although confidence intervals were wide. Applied to real datasets, the estimated relative risks were largely consistent with current knowledge. For example, tobacco smoking status 5 years prior to diagnosis was associated with an increased age-adjusted risk of all cause death within 1 year of diagnosis for oesophageal squamous cell carcinoma (RR = 1.99 95% CI 1.24,3.12) but not oesophageal adenocarcinoma RR = 1.61, 95% CI 0.79,2.57). Conclusions We have demonstrated a novel imputation-based algorithm for augmenting cancer registry data for epidemiological research which can be used when there are no cases in common between data sets. The algorithm allows investigation of research questions which could not be addressed through direct data linkage.http://link.springer.com/article/10.1186/s12885-020-06990-3Cancer registriesAlcohol drinkingOesophageal neoplasmsExerciseObesityTobacco smoking
collection DOAJ
language English
format Article
sources DOAJ
author Paul P. Fahey
Andrew Page
Glenn Stone
Thomas Astell-Burt
spellingShingle Paul P. Fahey
Andrew Page
Glenn Stone
Thomas Astell-Burt
Augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer
BMC Cancer
Cancer registries
Alcohol drinking
Oesophageal neoplasms
Exercise
Obesity
Tobacco smoking
author_facet Paul P. Fahey
Andrew Page
Glenn Stone
Thomas Astell-Burt
author_sort Paul P. Fahey
title Augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer
title_short Augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer
title_full Augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer
title_fullStr Augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer
title_full_unstemmed Augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer
title_sort augmenting cancer registry data with health survey data with no cases in common: the relationship between pre-diagnosis health behaviour and post-diagnosis survival in oesophageal cancer
publisher BMC
series BMC Cancer
issn 1471-2407
publishDate 2020-06-01
description Abstract Background For epidemiological research, cancer registry datasets often need to be augmented with additional data. Data linkage is not feasible when there are no cases in common between data sets. We present a novel approach to augmenting cancer registry data by imputing pre-diagnosis health behaviour and estimating its relationship with post-diagnosis survival time. Methods Six measures of pre-diagnosis health behaviours (focussing on tobacco smoking, ‘at risk’ alcohol consumption, overweight and exercise) were imputed for 28,000 cancer registry data records of US oesophageal cancers using cold deck imputation from an unrelated health behaviour dataset. Each data point was imputed twice. This calibration allowed us to estimate the misclassification rate. We applied statistical correction for the misclassification to estimate the relative risk of dying within 1 year of diagnosis for each of the imputed behaviour variables. Subgroup analyses were conducted for adenocarcinoma and squamous cell carcinoma separately. Results Simulated survival data confirmed that accurate estimates of true relative risks could be retrieved for health behaviours with greater than 5% prevalence, although confidence intervals were wide. Applied to real datasets, the estimated relative risks were largely consistent with current knowledge. For example, tobacco smoking status 5 years prior to diagnosis was associated with an increased age-adjusted risk of all cause death within 1 year of diagnosis for oesophageal squamous cell carcinoma (RR = 1.99 95% CI 1.24,3.12) but not oesophageal adenocarcinoma RR = 1.61, 95% CI 0.79,2.57). Conclusions We have demonstrated a novel imputation-based algorithm for augmenting cancer registry data for epidemiological research which can be used when there are no cases in common between data sets. The algorithm allows investigation of research questions which could not be addressed through direct data linkage.
topic Cancer registries
Alcohol drinking
Oesophageal neoplasms
Exercise
Obesity
Tobacco smoking
url http://link.springer.com/article/10.1186/s12885-020-06990-3
work_keys_str_mv AT paulpfahey augmentingcancerregistrydatawithhealthsurveydatawithnocasesincommontherelationshipbetweenprediagnosishealthbehaviourandpostdiagnosissurvivalinoesophagealcancer
AT andrewpage augmentingcancerregistrydatawithhealthsurveydatawithnocasesincommontherelationshipbetweenprediagnosishealthbehaviourandpostdiagnosissurvivalinoesophagealcancer
AT glennstone augmentingcancerregistrydatawithhealthsurveydatawithnocasesincommontherelationshipbetweenprediagnosishealthbehaviourandpostdiagnosissurvivalinoesophagealcancer
AT thomasastellburt augmentingcancerregistrydatawithhealthsurveydatawithnocasesincommontherelationshipbetweenprediagnosishealthbehaviourandpostdiagnosissurvivalinoesophagealcancer
_version_ 1724577238016327680