A robust approach based on Weibull distribution for clustering gene expression data

<p>Abstract</p> <p>Background</p> <p>Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expr...

Full description

Bibliographic Details
Main Authors: Gong Binsheng, Li Xia, Wang Zhenzhen, Wang Huakun, Feng Lixin, Zhou Ying
Format: Article
Language:English
Published: BMC 2011-05-01
Series:Algorithms for Molecular Biology
Online Access:http://www.almob.org/content/6/1/14
id doaj-2273b308365c499484aba96efb6b3448
record_format Article
spelling doaj-2273b308365c499484aba96efb6b34482020-11-25T00:25:00ZengBMCAlgorithms for Molecular Biology1748-71882011-05-01611410.1186/1748-7188-6-14A robust approach based on Weibull distribution for clustering gene expression dataGong BinshengLi XiaWang ZhenzhenWang HuakunFeng LixinZhou Ying<p>Abstract</p> <p>Background</p> <p>Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest.</p> <p>Results</p> <p>In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method), a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM) using functional annotation information given by the Gene Ontology (GO). The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets.</p> <p>Conclusions</p> <p>The results demonstrate that our WDCM produces clusters with more consistent functional annotations than the other methods. The WDCM is also verified to be robust and is capable of clustering gene expression data containing a small quantity of missing values.</p> http://www.almob.org/content/6/1/14
collection DOAJ
language English
format Article
sources DOAJ
author Gong Binsheng
Li Xia
Wang Zhenzhen
Wang Huakun
Feng Lixin
Zhou Ying
spellingShingle Gong Binsheng
Li Xia
Wang Zhenzhen
Wang Huakun
Feng Lixin
Zhou Ying
A robust approach based on Weibull distribution for clustering gene expression data
Algorithms for Molecular Biology
author_facet Gong Binsheng
Li Xia
Wang Zhenzhen
Wang Huakun
Feng Lixin
Zhou Ying
author_sort Gong Binsheng
title A robust approach based on Weibull distribution for clustering gene expression data
title_short A robust approach based on Weibull distribution for clustering gene expression data
title_full A robust approach based on Weibull distribution for clustering gene expression data
title_fullStr A robust approach based on Weibull distribution for clustering gene expression data
title_full_unstemmed A robust approach based on Weibull distribution for clustering gene expression data
title_sort robust approach based on weibull distribution for clustering gene expression data
publisher BMC
series Algorithms for Molecular Biology
issn 1748-7188
publishDate 2011-05-01
description <p>Abstract</p> <p>Background</p> <p>Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest.</p> <p>Results</p> <p>In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method), a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM) using functional annotation information given by the Gene Ontology (GO). The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets.</p> <p>Conclusions</p> <p>The results demonstrate that our WDCM produces clusters with more consistent functional annotations than the other methods. The WDCM is also verified to be robust and is capable of clustering gene expression data containing a small quantity of missing values.</p>
url http://www.almob.org/content/6/1/14
work_keys_str_mv AT gongbinsheng arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT lixia arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT wangzhenzhen arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT wanghuakun arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT fenglixin arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT zhouying arobustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT gongbinsheng robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT lixia robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT wangzhenzhen robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT wanghuakun robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT fenglixin robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
AT zhouying robustapproachbasedonweibulldistributionforclusteringgeneexpressiondata
_version_ 1725350418777636864