Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical bi...

Full description

Bibliographic Details
Main Authors: Ujjwal Maulik, Saurav Mallik, Anirban Mukhopadhyay, Sanghamitra Bandyopadhyay
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0119448
id doaj-38b7c750c5cc4b9eb8742ad4360d653b
record_format Article
spelling doaj-38b7c750c5cc4b9eb8742ad4360d653b2021-03-03T20:06:51ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01104e011944810.1371/journal.pone.0119448Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.Ujjwal MaulikSaurav MallikAnirban MukhopadhyaySanghamitra BandyopadhyayMicroarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.https://doi.org/10.1371/journal.pone.0119448
collection DOAJ
language English
format Article
sources DOAJ
author Ujjwal Maulik
Saurav Mallik
Anirban Mukhopadhyay
Sanghamitra Bandyopadhyay
spellingShingle Ujjwal Maulik
Saurav Mallik
Anirban Mukhopadhyay
Sanghamitra Bandyopadhyay
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.
PLoS ONE
author_facet Ujjwal Maulik
Saurav Mallik
Anirban Mukhopadhyay
Sanghamitra Bandyopadhyay
author_sort Ujjwal Maulik
title Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.
title_short Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.
title_full Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.
title_fullStr Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.
title_full_unstemmed Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.
title_sort analyzing large gene expression and methylation data profiles using statbicrm: statistical biclustering-based rule mining.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2015-01-01
description Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.
url https://doi.org/10.1371/journal.pone.0119448
work_keys_str_mv AT ujjwalmaulik analyzinglargegeneexpressionandmethylationdataprofilesusingstatbicrmstatisticalbiclusteringbasedrulemining
AT sauravmallik analyzinglargegeneexpressionandmethylationdataprofilesusingstatbicrmstatisticalbiclusteringbasedrulemining
AT anirbanmukhopadhyay analyzinglargegeneexpressionandmethylationdataprofilesusingstatbicrmstatisticalbiclusteringbasedrulemining
AT sanghamitrabandyopadhyay analyzinglargegeneexpressionandmethylationdataprofilesusingstatbicrmstatisticalbiclusteringbasedrulemining
_version_ 1714823942668025856