Learning Subject-Specific Directed Acyclic Graphs With Mixed Effects Structural Equation Models From Observational Data

The identification of causal relationships between random variables from large-scale observational data using directed acyclic graphs (DAG) is highly challenging. We propose a new mixed-effects structural equation model (mSEM) framework to estimate subject-specific DAGs, where we represent joint dis...

Full description

Bibliographic Details
Main Authors: Xiang Li, Shanghong Xie, Peter McColgan, Sarah J. Tabrizi, Rachael I. Scahill, Donglin Zeng, Yuanjia Wang
Format: Article
Language:English
Published: Frontiers Media S.A. 2018-10-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2018.00430/full
id doaj-6211291622cb4af79ce5cf4024e4167e
record_format Article
spelling doaj-6211291622cb4af79ce5cf4024e4167e2020-11-25T02:26:02ZengFrontiers Media S.A.Frontiers in Genetics1664-80212018-10-01910.3389/fgene.2018.00430410326Learning Subject-Specific Directed Acyclic Graphs With Mixed Effects Structural Equation Models From Observational DataXiang Li0Shanghong Xie1Peter McColgan2Sarah J. Tabrizi3Rachael I. Scahill4Donglin Zeng5Yuanjia Wang6Yuanjia Wang7Statistics and Decision Sciences, Janssen Research and Development, LLC, Raritan, NJ, United StatesDepartment of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, United StatesNational Hospital for Neurology and Neurosurgery, London, United KingdomNational Hospital for Neurology and Neurosurgery, London, United KingdomNational Hospital for Neurology and Neurosurgery, London, United KingdomDepartment of Biostatistics, University of North Carolina, Chapel Hill, NC, United StatesDepartment of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, United StatesDepartments of Psychiatry, Columbia University Medical Center, New York, NY, United StatesThe identification of causal relationships between random variables from large-scale observational data using directed acyclic graphs (DAG) is highly challenging. We propose a new mixed-effects structural equation model (mSEM) framework to estimate subject-specific DAGs, where we represent joint distribution of random variables in the DAG as a set of structural causal equations with mixed effects. The directed edges between nodes depend on observed exogenous covariates on each of the individual and unobserved latent variables. The strength of the connection is decomposed into a fixed-effect term representing the average causal effect given the covariates and a random effect term representing the latent causal effect due to unobserved pathways. The advantage of such decomposition is to capture essential asymmetric structural information and heterogeneity between DAGs in order to allow for the identification of causal structure with observational data. In addition, by pooling information across subject-specific DAGs, we can identify causal structure with a high probability and estimate subject-specific networks with a high precision. We propose a penalized likelihood-based approach to handle multi-dimensionality of the DAG model. We propose a fast, iterative computational algorithm, DAG-MM, to estimate parameters in mSEM and achieve desirable sparsity by hard-thresholding the edges. We theoretically prove the identifiability of mSEM. Using simulations and an application to protein signaling data, we show substantially improved performances when compared to existing methods and consistent results with a network estimated from interventional data. Lastly, we identify gray matter atrophy networks in regions of brain from patients with Huntington's disease and corroborate our findings using white matter connectivity data collected from an independent study.https://www.frontiersin.org/article/10.3389/fgene.2018.00430/fullgraphical modelsnetwork analysiscausal structure discoveryheterogeneityregularization
collection DOAJ
language English
format Article
sources DOAJ
author Xiang Li
Shanghong Xie
Peter McColgan
Sarah J. Tabrizi
Rachael I. Scahill
Donglin Zeng
Yuanjia Wang
Yuanjia Wang
spellingShingle Xiang Li
Shanghong Xie
Peter McColgan
Sarah J. Tabrizi
Rachael I. Scahill
Donglin Zeng
Yuanjia Wang
Yuanjia Wang
Learning Subject-Specific Directed Acyclic Graphs With Mixed Effects Structural Equation Models From Observational Data
Frontiers in Genetics
graphical models
network analysis
causal structure discovery
heterogeneity
regularization
author_facet Xiang Li
Shanghong Xie
Peter McColgan
Sarah J. Tabrizi
Rachael I. Scahill
Donglin Zeng
Yuanjia Wang
Yuanjia Wang
author_sort Xiang Li
title Learning Subject-Specific Directed Acyclic Graphs With Mixed Effects Structural Equation Models From Observational Data
title_short Learning Subject-Specific Directed Acyclic Graphs With Mixed Effects Structural Equation Models From Observational Data
title_full Learning Subject-Specific Directed Acyclic Graphs With Mixed Effects Structural Equation Models From Observational Data
title_fullStr Learning Subject-Specific Directed Acyclic Graphs With Mixed Effects Structural Equation Models From Observational Data
title_full_unstemmed Learning Subject-Specific Directed Acyclic Graphs With Mixed Effects Structural Equation Models From Observational Data
title_sort learning subject-specific directed acyclic graphs with mixed effects structural equation models from observational data
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2018-10-01
description The identification of causal relationships between random variables from large-scale observational data using directed acyclic graphs (DAG) is highly challenging. We propose a new mixed-effects structural equation model (mSEM) framework to estimate subject-specific DAGs, where we represent joint distribution of random variables in the DAG as a set of structural causal equations with mixed effects. The directed edges between nodes depend on observed exogenous covariates on each of the individual and unobserved latent variables. The strength of the connection is decomposed into a fixed-effect term representing the average causal effect given the covariates and a random effect term representing the latent causal effect due to unobserved pathways. The advantage of such decomposition is to capture essential asymmetric structural information and heterogeneity between DAGs in order to allow for the identification of causal structure with observational data. In addition, by pooling information across subject-specific DAGs, we can identify causal structure with a high probability and estimate subject-specific networks with a high precision. We propose a penalized likelihood-based approach to handle multi-dimensionality of the DAG model. We propose a fast, iterative computational algorithm, DAG-MM, to estimate parameters in mSEM and achieve desirable sparsity by hard-thresholding the edges. We theoretically prove the identifiability of mSEM. Using simulations and an application to protein signaling data, we show substantially improved performances when compared to existing methods and consistent results with a network estimated from interventional data. Lastly, we identify gray matter atrophy networks in regions of brain from patients with Huntington's disease and corroborate our findings using white matter connectivity data collected from an independent study.
topic graphical models
network analysis
causal structure discovery
heterogeneity
regularization
url https://www.frontiersin.org/article/10.3389/fgene.2018.00430/full
work_keys_str_mv AT xiangli learningsubjectspecificdirectedacyclicgraphswithmixedeffectsstructuralequationmodelsfromobservationaldata
AT shanghongxie learningsubjectspecificdirectedacyclicgraphswithmixedeffectsstructuralequationmodelsfromobservationaldata
AT petermccolgan learningsubjectspecificdirectedacyclicgraphswithmixedeffectsstructuralequationmodelsfromobservationaldata
AT sarahjtabrizi learningsubjectspecificdirectedacyclicgraphswithmixedeffectsstructuralequationmodelsfromobservationaldata
AT rachaeliscahill learningsubjectspecificdirectedacyclicgraphswithmixedeffectsstructuralequationmodelsfromobservationaldata
AT donglinzeng learningsubjectspecificdirectedacyclicgraphswithmixedeffectsstructuralequationmodelsfromobservationaldata
AT yuanjiawang learningsubjectspecificdirectedacyclicgraphswithmixedeffectsstructuralequationmodelsfromobservationaldata
AT yuanjiawang learningsubjectspecificdirectedacyclicgraphswithmixedeffectsstructuralequationmodelsfromobservationaldata
_version_ 1724848829959766016