Harmonization of data from cohort studies– potential challenges and opportunities

Introduction Pooling data from cohort studies can be used to increase sample size. However, individual datasets may contain variables that measure the same construct differently, posing challenges in the usefulness of combined datasets. Variable harmonization (an effort that provides comparable vie...

Full description

Bibliographic Details
Main Authors: Kamala Adhikari Dahal, Scott Patten, Tyler Williamson, Alka Patel, Shahirose Premji, Suzanne Tough, Nicole Letourneau, Gerald Giesbrecht
Format: Article
Language:English
Published: Swansea University 2018-09-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/868
id doaj-478a7a1b1d7e4069a83862c6e76c7e3a
record_format Article
spelling doaj-478a7a1b1d7e4069a83862c6e76c7e3a2020-11-24T21:45:38ZengSwansea UniversityInternational Journal of Population Data Science2399-49082018-09-013410.23889/ijpds.v3i4.868868Harmonization of data from cohort studies– potential challenges and opportunitiesKamala Adhikari Dahal0Scott Patten1Tyler Williamson2Alka Patel3Shahirose Premji4Suzanne Tough5Nicole Letourneau6Gerald Giesbrecht7University of CalgaryUniversity of CalgaryUniversity of CalgaryAlberta Health ServicesUniversity CalgaryPaediatrics, Cumming School of Medicine, University of CalgaryUniversity of CalgaryUniversity of Calgary Introduction Pooling data from cohort studies can be used to increase sample size. However, individual datasets may contain variables that measure the same construct differently, posing challenges in the usefulness of combined datasets. Variable harmonization (an effort that provides comparable view of data from different studies) may address this issue. Objectives and Approach This study harmonized existing datasets from two prospective pregnancy cohort studies in Alberta Canada (All Our Families (n=3,351) and Alberta Pregnancy Outcome and Nutrition (n=2,187)). Given the comparability of the characteristics of the two cohorts and similarities of the core data elements of interest, data harmonization was justifiable. Data harmonization was performed considering multiple factors, such as complete or partial variable matching regarding question asked/responded, the response coded (value level, value definition, data type), the frequency of measurement, the pregnancy time-period of measurement, and missing values. Multiple imputation was used to address missing data resulting from the data harmonization process. Results Several variables such as ethnicity, income, parity, gestational age, anxiety, and depression were harmonized using different procedures. If the question asked/answered and the response recorded was the same in both datasets, no variable manipulation was done. If the response recorded was different, the response was re-categorized/re-organized to optimize comparability of data from both datasets. Missing values were created for each resulting unmatched variables and were replaced using multiple imputation if the same construct was measured in both datasets but using different ways/scales. A scale that was used in both datasets was identified as a reference standard. If the variables were measured in multiple times and/or different time-periods, variables were synchronized using pregnancy trimesters data. Finally, harmonized datasets were then combined/pooled into a single dataset (n=5,588). Conclusion/Implications Variable harmonization is an important aspect of conducting research using multiple datasets. It provides an opportunity to increase study power through maximizing sample size, permitting more sophisticated statistical analyses, and to answer novel research questions that could not be addressed using a single study. https://ijpds.org/article/view/868
collection DOAJ
language English
format Article
sources DOAJ
author Kamala Adhikari Dahal
Scott Patten
Tyler Williamson
Alka Patel
Shahirose Premji
Suzanne Tough
Nicole Letourneau
Gerald Giesbrecht
spellingShingle Kamala Adhikari Dahal
Scott Patten
Tyler Williamson
Alka Patel
Shahirose Premji
Suzanne Tough
Nicole Letourneau
Gerald Giesbrecht
Harmonization of data from cohort studies– potential challenges and opportunities
International Journal of Population Data Science
author_facet Kamala Adhikari Dahal
Scott Patten
Tyler Williamson
Alka Patel
Shahirose Premji
Suzanne Tough
Nicole Letourneau
Gerald Giesbrecht
author_sort Kamala Adhikari Dahal
title Harmonization of data from cohort studies– potential challenges and opportunities
title_short Harmonization of data from cohort studies– potential challenges and opportunities
title_full Harmonization of data from cohort studies– potential challenges and opportunities
title_fullStr Harmonization of data from cohort studies– potential challenges and opportunities
title_full_unstemmed Harmonization of data from cohort studies– potential challenges and opportunities
title_sort harmonization of data from cohort studies– potential challenges and opportunities
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2018-09-01
description Introduction Pooling data from cohort studies can be used to increase sample size. However, individual datasets may contain variables that measure the same construct differently, posing challenges in the usefulness of combined datasets. Variable harmonization (an effort that provides comparable view of data from different studies) may address this issue. Objectives and Approach This study harmonized existing datasets from two prospective pregnancy cohort studies in Alberta Canada (All Our Families (n=3,351) and Alberta Pregnancy Outcome and Nutrition (n=2,187)). Given the comparability of the characteristics of the two cohorts and similarities of the core data elements of interest, data harmonization was justifiable. Data harmonization was performed considering multiple factors, such as complete or partial variable matching regarding question asked/responded, the response coded (value level, value definition, data type), the frequency of measurement, the pregnancy time-period of measurement, and missing values. Multiple imputation was used to address missing data resulting from the data harmonization process. Results Several variables such as ethnicity, income, parity, gestational age, anxiety, and depression were harmonized using different procedures. If the question asked/answered and the response recorded was the same in both datasets, no variable manipulation was done. If the response recorded was different, the response was re-categorized/re-organized to optimize comparability of data from both datasets. Missing values were created for each resulting unmatched variables and were replaced using multiple imputation if the same construct was measured in both datasets but using different ways/scales. A scale that was used in both datasets was identified as a reference standard. If the variables were measured in multiple times and/or different time-periods, variables were synchronized using pregnancy trimesters data. Finally, harmonized datasets were then combined/pooled into a single dataset (n=5,588). Conclusion/Implications Variable harmonization is an important aspect of conducting research using multiple datasets. It provides an opportunity to increase study power through maximizing sample size, permitting more sophisticated statistical analyses, and to answer novel research questions that could not be addressed using a single study.
url https://ijpds.org/article/view/868
work_keys_str_mv AT kamalaadhikaridahal harmonizationofdatafromcohortstudiespotentialchallengesandopportunities
AT scottpatten harmonizationofdatafromcohortstudiespotentialchallengesandopportunities
AT tylerwilliamson harmonizationofdatafromcohortstudiespotentialchallengesandopportunities
AT alkapatel harmonizationofdatafromcohortstudiespotentialchallengesandopportunities
AT shahirosepremji harmonizationofdatafromcohortstudiespotentialchallengesandopportunities
AT suzannetough harmonizationofdatafromcohortstudiespotentialchallengesandopportunities
AT nicoleletourneau harmonizationofdatafromcohortstudiespotentialchallengesandopportunities
AT geraldgiesbrecht harmonizationofdatafromcohortstudiespotentialchallengesandopportunities
_version_ 1725905197149978624