Identification of functionally related genes using data mining and data integration: a breast cancer case study

<p>Abstract</p> <p>Background</p> <p>The identification of the organisation and dynamics of molecular pathways is crucial for the understanding of cell function. In order to reconstruct the molecular pathways in which a gene of interest is involved in regulating a cell,...

Full description

Bibliographic Details
Main Authors: Zucchi Ileana, Reinbold Rolland A, Vilardo Laura, Piscitelli Eleonora, Bertoli Gloria, Mosca Ettore, Milanesi Luciano
Format: Article
Language:English
Published: BMC 2009-10-01
Series:BMC Bioinformatics
id doaj-a192f8b804764f6eabdec508d511eaff
record_format Article
spelling doaj-a192f8b804764f6eabdec508d511eaff2020-11-24T21:24:30ZengBMCBMC Bioinformatics1471-21052009-10-0110Suppl 12S810.1186/1471-2105-10-S12-S8Identification of functionally related genes using data mining and data integration: a breast cancer case studyZucchi IleanaReinbold Rolland AVilardo LauraPiscitelli EleonoraBertoli GloriaMosca EttoreMilanesi Luciano<p>Abstract</p> <p>Background</p> <p>The identification of the organisation and dynamics of molecular pathways is crucial for the understanding of cell function. In order to reconstruct the molecular pathways in which a gene of interest is involved in regulating a cell, it is important to identify the set of genes to which it interacts with to determine cell function. In this context, the mining and the integration of a large amount of publicly available data, regarding the transcriptome and the proteome states of a cell, are a useful resource to complement biological research.</p> <p>Results</p> <p>We describe an approach for the identification of genes that interact with each other to regulate cell function. The strategy relies on the analysis of gene expression profile similarity, considering large datasets of expression data. During the similarity evaluation, the methodology determines the most significant subset of samples in which the evaluated genes are highly correlated. Hence, the strategy enables the exclusion of samples that are not relevant for each gene pair analysed. This feature is important when considering a large set of samples characterised by heterogeneous experimental conditions where different pools of biological processes can be active across the samples. The putative partners of the studied gene are then further characterised, analysing the distribution of the Gene Ontology terms and integrating the protein-protein interaction (PPI) data. The strategy was applied for the analysis of the functional relationships of a gene of known function, Pyruvate Kinase, and for the prediction of functional partners of the human transcription factor TBX3. In both cases the analysis was done on a dataset composed by breast primary tumour expression data derived from the literature. Integration and analysis of PPI data confirmed the prediction of the methodology, since the genes identified to be functionally related were associated to proteins close in the PPI network. Two genes among the predicted putative partners of TBX3 (GLI3 and GATA3) were confirmed by <it>in vivo </it>binding assays (crosslinking immunoprecipitation, X-ChIP) in which the putative DNA enhancer sequence sites of GATA3 and GLI3 were found to be bound by the Tbx3 protein.</p> <p>Conclusion</p> <p>The presented strategy is demonstrated to be an effective approach to identify genes that establish functional relationships. The methodology identifies and characterises genes with a similar expression profile, through data mining and integrating data from publicly available resources, to contribute to a better understanding of gene regulation and cell function. The prediction of the TBX3 target genes GLI3 and GATA3 was experimentally confirmed.</p>
collection DOAJ
language English
format Article
sources DOAJ
author Zucchi Ileana
Reinbold Rolland A
Vilardo Laura
Piscitelli Eleonora
Bertoli Gloria
Mosca Ettore
Milanesi Luciano
spellingShingle Zucchi Ileana
Reinbold Rolland A
Vilardo Laura
Piscitelli Eleonora
Bertoli Gloria
Mosca Ettore
Milanesi Luciano
Identification of functionally related genes using data mining and data integration: a breast cancer case study
BMC Bioinformatics
author_facet Zucchi Ileana
Reinbold Rolland A
Vilardo Laura
Piscitelli Eleonora
Bertoli Gloria
Mosca Ettore
Milanesi Luciano
author_sort Zucchi Ileana
title Identification of functionally related genes using data mining and data integration: a breast cancer case study
title_short Identification of functionally related genes using data mining and data integration: a breast cancer case study
title_full Identification of functionally related genes using data mining and data integration: a breast cancer case study
title_fullStr Identification of functionally related genes using data mining and data integration: a breast cancer case study
title_full_unstemmed Identification of functionally related genes using data mining and data integration: a breast cancer case study
title_sort identification of functionally related genes using data mining and data integration: a breast cancer case study
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2009-10-01
description <p>Abstract</p> <p>Background</p> <p>The identification of the organisation and dynamics of molecular pathways is crucial for the understanding of cell function. In order to reconstruct the molecular pathways in which a gene of interest is involved in regulating a cell, it is important to identify the set of genes to which it interacts with to determine cell function. In this context, the mining and the integration of a large amount of publicly available data, regarding the transcriptome and the proteome states of a cell, are a useful resource to complement biological research.</p> <p>Results</p> <p>We describe an approach for the identification of genes that interact with each other to regulate cell function. The strategy relies on the analysis of gene expression profile similarity, considering large datasets of expression data. During the similarity evaluation, the methodology determines the most significant subset of samples in which the evaluated genes are highly correlated. Hence, the strategy enables the exclusion of samples that are not relevant for each gene pair analysed. This feature is important when considering a large set of samples characterised by heterogeneous experimental conditions where different pools of biological processes can be active across the samples. The putative partners of the studied gene are then further characterised, analysing the distribution of the Gene Ontology terms and integrating the protein-protein interaction (PPI) data. The strategy was applied for the analysis of the functional relationships of a gene of known function, Pyruvate Kinase, and for the prediction of functional partners of the human transcription factor TBX3. In both cases the analysis was done on a dataset composed by breast primary tumour expression data derived from the literature. Integration and analysis of PPI data confirmed the prediction of the methodology, since the genes identified to be functionally related were associated to proteins close in the PPI network. Two genes among the predicted putative partners of TBX3 (GLI3 and GATA3) were confirmed by <it>in vivo </it>binding assays (crosslinking immunoprecipitation, X-ChIP) in which the putative DNA enhancer sequence sites of GATA3 and GLI3 were found to be bound by the Tbx3 protein.</p> <p>Conclusion</p> <p>The presented strategy is demonstrated to be an effective approach to identify genes that establish functional relationships. The methodology identifies and characterises genes with a similar expression profile, through data mining and integrating data from publicly available resources, to contribute to a better understanding of gene regulation and cell function. The prediction of the TBX3 target genes GLI3 and GATA3 was experimentally confirmed.</p>
work_keys_str_mv AT zucchiileana identificationoffunctionallyrelatedgenesusingdatamininganddataintegrationabreastcancercasestudy
AT reinboldrollanda identificationoffunctionallyrelatedgenesusingdatamininganddataintegrationabreastcancercasestudy
AT vilardolaura identificationoffunctionallyrelatedgenesusingdatamininganddataintegrationabreastcancercasestudy
AT piscitellieleonora identificationoffunctionallyrelatedgenesusingdatamininganddataintegrationabreastcancercasestudy
AT bertoligloria identificationoffunctionallyrelatedgenesusingdatamininganddataintegrationabreastcancercasestudy
AT moscaettore identificationoffunctionallyrelatedgenesusingdatamininganddataintegrationabreastcancercasestudy
AT milanesiluciano identificationoffunctionallyrelatedgenesusingdatamininganddataintegrationabreastcancercasestudy
_version_ 1725987861175468032