FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral

<p>Abstract</p> <p>Background</p> <p>Regulatory motifs describe sets of related transcription factor binding sites (TFBSs) and can be represented as position frequency matrices (PFMs). De novo identification of TFBSs is a crucial problem in computational biology which i...

Full description

Bibliographic Details
Main Authors: Cano Carlos, Lopez Francisco J, Garcia Fernando, Blanco Armando
Format: Article
Language:English
Published: BMC 2009-07-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/10/224
id doaj-1078506aa0ab40f7907c046f05346b6e
record_format Article
spelling doaj-1078506aa0ab40f7907c046f05346b6e2020-11-25T00:33:43ZengBMCBMC Bioinformatics1471-21052009-07-0110122410.1186/1471-2105-10-224FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integralCano CarlosLopez Francisco JGarcia FernandoBlanco Armando<p>Abstract</p> <p>Background</p> <p>Regulatory motifs describe sets of related transcription factor binding sites (TFBSs) and can be represented as position frequency matrices (PFMs). De novo identification of TFBSs is a crucial problem in computational biology which includes the issue of comparing putative motifs with one another and with motifs that are already known. The relative importance of each nucleotide within a given position in the PFMs should be considered in order to compute PFM similarities. Furthermore, biological data are inherently noisy and imprecise. Fuzzy set theory is particularly suitable for modeling imprecise data, whereas fuzzy integrals are highly appropriate for representing the interaction among different information sources.</p> <p>Results</p> <p>We propose FISim, a new similarity measure between PFMs, based on the fuzzy integral of the distance of the nucleotides with respect to the information content of the positions. Unlike existing methods, FISim is designed to consider the higher contribution of better conserved positions to the binding affinity. FISim provides excellent results when dealing with sets of randomly generated motifs, and outperforms the remaining methods when handling real datasets of related motifs. Furthermore, we propose a new cluster methodology based on kernel theory together with FISim to obtain groups of related motifs potentially bound by the same TFs, providing more robust results than existing approaches.</p> <p>Conclusion</p> <p>FISim corrects a design flaw of the most popular methods, whose measures favour similarity of low information content positions. We use our measure to successfully identify motifs that describe binding sites for the same TF and to solve real-life problems. In this study the reliability of fuzzy technology for motif comparison tasks is proven.</p> http://www.biomedcentral.com/1471-2105/10/224
collection DOAJ
language English
format Article
sources DOAJ
author Cano Carlos
Lopez Francisco J
Garcia Fernando
Blanco Armando
spellingShingle Cano Carlos
Lopez Francisco J
Garcia Fernando
Blanco Armando
FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral
BMC Bioinformatics
author_facet Cano Carlos
Lopez Francisco J
Garcia Fernando
Blanco Armando
author_sort Cano Carlos
title FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral
title_short FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral
title_full FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral
title_fullStr FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral
title_full_unstemmed FISim: A new similarity measure between transcription factor binding sites based on the fuzzy integral
title_sort fisim: a new similarity measure between transcription factor binding sites based on the fuzzy integral
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2009-07-01
description <p>Abstract</p> <p>Background</p> <p>Regulatory motifs describe sets of related transcription factor binding sites (TFBSs) and can be represented as position frequency matrices (PFMs). De novo identification of TFBSs is a crucial problem in computational biology which includes the issue of comparing putative motifs with one another and with motifs that are already known. The relative importance of each nucleotide within a given position in the PFMs should be considered in order to compute PFM similarities. Furthermore, biological data are inherently noisy and imprecise. Fuzzy set theory is particularly suitable for modeling imprecise data, whereas fuzzy integrals are highly appropriate for representing the interaction among different information sources.</p> <p>Results</p> <p>We propose FISim, a new similarity measure between PFMs, based on the fuzzy integral of the distance of the nucleotides with respect to the information content of the positions. Unlike existing methods, FISim is designed to consider the higher contribution of better conserved positions to the binding affinity. FISim provides excellent results when dealing with sets of randomly generated motifs, and outperforms the remaining methods when handling real datasets of related motifs. Furthermore, we propose a new cluster methodology based on kernel theory together with FISim to obtain groups of related motifs potentially bound by the same TFs, providing more robust results than existing approaches.</p> <p>Conclusion</p> <p>FISim corrects a design flaw of the most popular methods, whose measures favour similarity of low information content positions. We use our measure to successfully identify motifs that describe binding sites for the same TF and to solve real-life problems. In this study the reliability of fuzzy technology for motif comparison tasks is proven.</p>
url http://www.biomedcentral.com/1471-2105/10/224
work_keys_str_mv AT canocarlos fisimanewsimilaritymeasurebetweentranscriptionfactorbindingsitesbasedonthefuzzyintegral
AT lopezfranciscoj fisimanewsimilaritymeasurebetweentranscriptionfactorbindingsitesbasedonthefuzzyintegral
AT garciafernando fisimanewsimilaritymeasurebetweentranscriptionfactorbindingsitesbasedonthefuzzyintegral
AT blancoarmando fisimanewsimilaritymeasurebetweentranscriptionfactorbindingsitesbasedonthefuzzyintegral
_version_ 1725315199422955520