Partial mixture model for tight clustering of gene expression time-course

<p>Abstract</p> <p>Background</p> <p>Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, i...

Full description

Bibliographic Details
Main Authors: Li Chang-Tsun, Yuan Yinyin, Wilson Roland
Format: Article
Language:English
Published: BMC 2008-06-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/287
id doaj-a828a26085be4df181ab7c7ec25465dc
record_format Article
spelling doaj-a828a26085be4df181ab7c7ec25465dc2020-11-25T00:25:00ZengBMCBMC Bioinformatics1471-21052008-06-019128710.1186/1471-2105-9-287Partial mixture model for tight clustering of gene expression time-courseLi Chang-TsunYuan YinyinWilson Roland<p>Abstract</p> <p>Background</p> <p>Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored.</p> <p>Results</p> <p>In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms.</p> <p>Conclusion</p> <p>For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the combination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion.</p> http://www.biomedcentral.com/1471-2105/9/287
collection DOAJ
language English
format Article
sources DOAJ
author Li Chang-Tsun
Yuan Yinyin
Wilson Roland
spellingShingle Li Chang-Tsun
Yuan Yinyin
Wilson Roland
Partial mixture model for tight clustering of gene expression time-course
BMC Bioinformatics
author_facet Li Chang-Tsun
Yuan Yinyin
Wilson Roland
author_sort Li Chang-Tsun
title Partial mixture model for tight clustering of gene expression time-course
title_short Partial mixture model for tight clustering of gene expression time-course
title_full Partial mixture model for tight clustering of gene expression time-course
title_fullStr Partial mixture model for tight clustering of gene expression time-course
title_full_unstemmed Partial mixture model for tight clustering of gene expression time-course
title_sort partial mixture model for tight clustering of gene expression time-course
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2008-06-01
description <p>Abstract</p> <p>Background</p> <p>Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored.</p> <p>Results</p> <p>In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms.</p> <p>Conclusion</p> <p>For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the combination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion.</p>
url http://www.biomedcentral.com/1471-2105/9/287
work_keys_str_mv AT lichangtsun partialmixturemodelfortightclusteringofgeneexpressiontimecourse
AT yuanyinyin partialmixturemodelfortightclusteringofgeneexpressiontimecourse
AT wilsonroland partialmixturemodelfortightclusteringofgeneexpressiontimecourse
_version_ 1725350353258414080