Harmonization of data from cohort studies– potential challenges and opportunities
Introduction Pooling data from cohort studies can be used to increase sample size. However, individual datasets may contain variables that measure the same construct differently, posing challenges in the usefulness of combined datasets. Variable harmonization (an effort that provides comparable vie...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2018-09-01
|
Series: | International Journal of Population Data Science |
Online Access: | https://ijpds.org/article/view/868 |
id |
doaj-478a7a1b1d7e4069a83862c6e76c7e3a |
---|---|
record_format |
Article |
spelling |
doaj-478a7a1b1d7e4069a83862c6e76c7e3a2020-11-24T21:45:38ZengSwansea UniversityInternational Journal of Population Data Science2399-49082018-09-013410.23889/ijpds.v3i4.868868Harmonization of data from cohort studies– potential challenges and opportunitiesKamala Adhikari Dahal0Scott Patten1Tyler Williamson2Alka Patel3Shahirose Premji4Suzanne Tough5Nicole Letourneau6Gerald Giesbrecht7University of CalgaryUniversity of CalgaryUniversity of CalgaryAlberta Health ServicesUniversity CalgaryPaediatrics, Cumming School of Medicine, University of CalgaryUniversity of CalgaryUniversity of Calgary Introduction Pooling data from cohort studies can be used to increase sample size. However, individual datasets may contain variables that measure the same construct differently, posing challenges in the usefulness of combined datasets. Variable harmonization (an effort that provides comparable view of data from different studies) may address this issue. Objectives and Approach This study harmonized existing datasets from two prospective pregnancy cohort studies in Alberta Canada (All Our Families (n=3,351) and Alberta Pregnancy Outcome and Nutrition (n=2,187)). Given the comparability of the characteristics of the two cohorts and similarities of the core data elements of interest, data harmonization was justifiable. Data harmonization was performed considering multiple factors, such as complete or partial variable matching regarding question asked/responded, the response coded (value level, value definition, data type), the frequency of measurement, the pregnancy time-period of measurement, and missing values. Multiple imputation was used to address missing data resulting from the data harmonization process. Results Several variables such as ethnicity, income, parity, gestational age, anxiety, and depression were harmonized using different procedures. If the question asked/answered and the response recorded was the same in both datasets, no variable manipulation was done. If the response recorded was different, the response was re-categorized/re-organized to optimize comparability of data from both datasets. Missing values were created for each resulting unmatched variables and were replaced using multiple imputation if the same construct was measured in both datasets but using different ways/scales. A scale that was used in both datasets was identified as a reference standard. If the variables were measured in multiple times and/or different time-periods, variables were synchronized using pregnancy trimesters data. Finally, harmonized datasets were then combined/pooled into a single dataset (n=5,588). Conclusion/Implications Variable harmonization is an important aspect of conducting research using multiple datasets. It provides an opportunity to increase study power through maximizing sample size, permitting more sophisticated statistical analyses, and to answer novel research questions that could not be addressed using a single study. https://ijpds.org/article/view/868 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Kamala Adhikari Dahal Scott Patten Tyler Williamson Alka Patel Shahirose Premji Suzanne Tough Nicole Letourneau Gerald Giesbrecht |
spellingShingle |
Kamala Adhikari Dahal Scott Patten Tyler Williamson Alka Patel Shahirose Premji Suzanne Tough Nicole Letourneau Gerald Giesbrecht Harmonization of data from cohort studies– potential challenges and opportunities International Journal of Population Data Science |
author_facet |
Kamala Adhikari Dahal Scott Patten Tyler Williamson Alka Patel Shahirose Premji Suzanne Tough Nicole Letourneau Gerald Giesbrecht |
author_sort |
Kamala Adhikari Dahal |
title |
Harmonization of data from cohort studies– potential challenges and opportunities |
title_short |
Harmonization of data from cohort studies– potential challenges and opportunities |
title_full |
Harmonization of data from cohort studies– potential challenges and opportunities |
title_fullStr |
Harmonization of data from cohort studies– potential challenges and opportunities |
title_full_unstemmed |
Harmonization of data from cohort studies– potential challenges and opportunities |
title_sort |
harmonization of data from cohort studies– potential challenges and opportunities |
publisher |
Swansea University |
series |
International Journal of Population Data Science |
issn |
2399-4908 |
publishDate |
2018-09-01 |
description |
Introduction
Pooling data from cohort studies can be used to increase sample size. However, individual datasets may contain variables that measure the same construct differently, posing challenges in the usefulness of combined datasets. Variable harmonization (an effort that provides comparable view of data from different studies) may address this issue.
Objectives and Approach
This study harmonized existing datasets from two prospective pregnancy cohort studies in Alberta Canada (All Our Families (n=3,351) and Alberta Pregnancy Outcome and Nutrition (n=2,187)). Given the comparability of the characteristics of the two cohorts and similarities of the core data elements of interest, data harmonization was justifiable. Data harmonization was performed considering multiple factors, such as complete or partial variable matching regarding question asked/responded, the response coded (value level, value definition, data type), the frequency of measurement, the pregnancy time-period of measurement, and missing values. Multiple imputation was used to address missing data resulting from the data harmonization process.
Results
Several variables such as ethnicity, income, parity, gestational age, anxiety, and depression were harmonized using different procedures. If the question asked/answered and the response recorded was the same in both datasets, no variable manipulation was done. If the response recorded was different, the response was re-categorized/re-organized to optimize comparability of data from both datasets. Missing values were created for each resulting unmatched variables and were replaced using multiple imputation if the same construct was measured in both datasets but using different ways/scales. A scale that was used in both datasets was identified as a reference standard. If the variables were measured in multiple times and/or different time-periods, variables were synchronized using pregnancy trimesters data. Finally, harmonized datasets were then combined/pooled into a single dataset (n=5,588).
Conclusion/Implications
Variable harmonization is an important aspect of conducting research using multiple datasets. It provides an opportunity to increase study power through maximizing sample size, permitting more sophisticated statistical analyses, and to answer novel research questions that could not be addressed using a single study.
|
url |
https://ijpds.org/article/view/868 |
work_keys_str_mv |
AT kamalaadhikaridahal harmonizationofdatafromcohortstudiespotentialchallengesandopportunities AT scottpatten harmonizationofdatafromcohortstudiespotentialchallengesandopportunities AT tylerwilliamson harmonizationofdatafromcohortstudiespotentialchallengesandopportunities AT alkapatel harmonizationofdatafromcohortstudiespotentialchallengesandopportunities AT shahirosepremji harmonizationofdatafromcohortstudiespotentialchallengesandopportunities AT suzannetough harmonizationofdatafromcohortstudiespotentialchallengesandopportunities AT nicoleletourneau harmonizationofdatafromcohortstudiespotentialchallengesandopportunities AT geraldgiesbrecht harmonizationofdatafromcohortstudiespotentialchallengesandopportunities |
_version_ |
1725905197149978624 |