Clustering of temporal gene expression data with mixtures of mixed effects models

While time-dependent processes are important to biological functions, methods to leverage temporal information from large data have remained computationally challenging. In temporal gene-expression data, clustering can be used to identify genes with shared function in complex processes. Algorithms...

Full description

Bibliographic Details
Main Author:	Lu, Darlene
Other Authors:	Demissie, Serkalem
Language:	en_US
Published:	2019
Subjects:	Biostatistics Clustering EM algorithm Gene expression Mixture model Model selection Polynomial regression
Online Access:	https://hdl.handle.net/2144/34905

id	ndltd-bu.edu-oai-open.bu.edu-2144-34905
record_format	oai_dc
spelling	ndltd-bu.edu-oai-open.bu.edu-2144-349052019-12-22T15:11:48Z Clustering of temporal gene expression data with mixtures of mixed effects models Lu, Darlene Demissie, Serkalem Biostatistics Clustering EM algorithm Gene expression Mixture model Model selection Polynomial regression While time-dependent processes are important to biological functions, methods to leverage temporal information from large data have remained computationally challenging. In temporal gene-expression data, clustering can be used to identify genes with shared function in complex processes. Algorithms like K-Means and standard Gaussian mixture-models (GMM) fail to account for variability in replicated data or repeated measures over time and require a priori cluster number assumptions, evaluating many cluster numbers to select an optimal result. An improved penalized-GMM offers a computationally-efficient algorithm to simultaneously optimize cluster number and labels. The work presented in this dissertation was motivated by mice bone-fracture models interested in determining patterns of temporal gene-expression during bone-healing progression. To solve this, an extension to the penalized-GMM was proposed to account for correlation between replicated data and repeated measures over time by introducing random-effects using a mixture of mixed-effects polynomial regression models and an entropy-penalized EM-Algorithm (EPEM). First, performance of EPEM for different mixed-effects models were assessed with simulation studies and applied to the fracture-healing study. Second, modifications to address the high computational cost of EPEM were considered that either clustered subsets of data determined by predicted polynomial-order (S-EPEM) or used modified-initialization to decrease the initial burden (I-EPEM). Each was compared to EPEM and applied to the fracture-healing study. Lastly, as varied rates of fracture-healing were observed for mice with different genetic-backgrounds (strains), a new analysis strategy was proposed to compare patterns of temporal gene-expression between different mice-strains and assessed with simulation studies. Expression-profiles for each strain were treated as separate objects to cluster in order to determine genes clustered into different groups across strain. We found that the addition of random-effects decreased accuracy of predicted cluster labels compared to K-Means, GMM, and fixed-effects EPEM. Polynomial-order optimization with BIC performed with highest accuracy, and optimization on subspaces obtained with singular-value-decomposition performed well. Computation time for S-EPEM was much reduced with a slight decrease in accuracy. I-EPEM was comparable to EPEM with similar accuracy and decrease in computation time. Application of the new analysis strategy on fracture-healing data identified several distinct temporal gene-expression patterns for the different strains. 2021-02-27T00:00:00Z 2019-04-23T17:33:01Z 2019 2019-02-27T17:02:54Z Thesis/Dissertation https://hdl.handle.net/2144/34905 en_US Attribution 4.0 International http://creativecommons.org/licenses/by/4.0/
collection	NDLTD
language	en_US
sources	NDLTD
topic	Biostatistics Clustering EM algorithm Gene expression Mixture model Model selection Polynomial regression
spellingShingle	Biostatistics Clustering EM algorithm Gene expression Mixture model Model selection Polynomial regression Lu, Darlene Clustering of temporal gene expression data with mixtures of mixed effects models
description	While time-dependent processes are important to biological functions, methods to leverage temporal information from large data have remained computationally challenging. In temporal gene-expression data, clustering can be used to identify genes with shared function in complex processes. Algorithms like K-Means and standard Gaussian mixture-models (GMM) fail to account for variability in replicated data or repeated measures over time and require a priori cluster number assumptions, evaluating many cluster numbers to select an optimal result. An improved penalized-GMM offers a computationally-efficient algorithm to simultaneously optimize cluster number and labels. The work presented in this dissertation was motivated by mice bone-fracture models interested in determining patterns of temporal gene-expression during bone-healing progression. To solve this, an extension to the penalized-GMM was proposed to account for correlation between replicated data and repeated measures over time by introducing random-effects using a mixture of mixed-effects polynomial regression models and an entropy-penalized EM-Algorithm (EPEM). First, performance of EPEM for different mixed-effects models were assessed with simulation studies and applied to the fracture-healing study. Second, modifications to address the high computational cost of EPEM were considered that either clustered subsets of data determined by predicted polynomial-order (S-EPEM) or used modified-initialization to decrease the initial burden (I-EPEM). Each was compared to EPEM and applied to the fracture-healing study. Lastly, as varied rates of fracture-healing were observed for mice with different genetic-backgrounds (strains), a new analysis strategy was proposed to compare patterns of temporal gene-expression between different mice-strains and assessed with simulation studies. Expression-profiles for each strain were treated as separate objects to cluster in order to determine genes clustered into different groups across strain. We found that the addition of random-effects decreased accuracy of predicted cluster labels compared to K-Means, GMM, and fixed-effects EPEM. Polynomial-order optimization with BIC performed with highest accuracy, and optimization on subspaces obtained with singular-value-decomposition performed well. Computation time for S-EPEM was much reduced with a slight decrease in accuracy. I-EPEM was comparable to EPEM with similar accuracy and decrease in computation time. Application of the new analysis strategy on fracture-healing data identified several distinct temporal gene-expression patterns for the different strains. === 2021-02-27T00:00:00Z
author2	Demissie, Serkalem
author_facet	Demissie, Serkalem Lu, Darlene
author	Lu, Darlene
author_sort	Lu, Darlene
title	Clustering of temporal gene expression data with mixtures of mixed effects models
title_short	Clustering of temporal gene expression data with mixtures of mixed effects models
title_full	Clustering of temporal gene expression data with mixtures of mixed effects models
title_fullStr	Clustering of temporal gene expression data with mixtures of mixed effects models
title_full_unstemmed	Clustering of temporal gene expression data with mixtures of mixed effects models
title_sort	clustering of temporal gene expression data with mixtures of mixed effects models
publishDate	2019
url	https://hdl.handle.net/2144/34905
work_keys_str_mv	AT ludarlene clusteringoftemporalgeneexpressiondatawithmixturesofmixedeffectsmodels
_version_	1719306434733146112

Clustering of temporal gene expression data with mixtures of mixed effects models

Similar Items