SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model

<p>Abstract</p> <p>Background</p> <p>Discrimination of transcription factor binding sites (TFBS) from background sequences plays a key role in computational motif discovery. Current clustering based algorithms employ homogeneous model for problem solving, which assumes...

Full description

Bibliographic Details
Main Authors: Wang Dianhui, Lee Nung
Format: Article
Language:English
Published: BMC 2011-02-01
Series:BMC Bioinformatics
id doaj-aee15b510d87452cba16223e53bf1766
record_format Article
spelling doaj-aee15b510d87452cba16223e53bf17662020-11-25T01:35:12ZengBMCBMC Bioinformatics1471-21052011-02-0112Suppl 1S1610.1186/1471-2105-12-S1-S16SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous modelWang DianhuiLee Nung<p>Abstract</p> <p>Background</p> <p>Discrimination of transcription factor binding sites (TFBS) from background sequences plays a key role in computational motif discovery. Current clustering based algorithms employ homogeneous model for problem solving, which assumes that motifs and background signals can be equivalently characterized. This assumption has some limitations because both sequence signals have distinct properties.</p> <p>Results</p> <p>This paper aims to develop a Self-Organizing Map (SOM) based clustering algorithm for extracting binding sites in DNA sequences. Our framework is based on a novel intra-node soft competitive procedure to achieve maximum discrimination of motifs from background signals in datasets. The intra-node competition is based on an adaptive weighting technique on two different signal models to better represent these two classes of signals. Using several real and artificial datasets, we compared our proposed method with several motif discovery tools. Compared to SOMBRERO, a state-of-the-art SOM based motif discovery tool, it is found that our algorithm can achieve significant improvements in the average precision rates (i.e., about 27%) on the real datasets without compromising its sensitivity. Our method also performed favourably comparing against other motif discovery tools.</p> <p>Conclusions</p> <p>Motif discovery with model based clustering framework should consider the use of heterogeneous model to represent the two classes of signals in DNA sequences. Such heterogeneous model can achieve better signal discrimination compared to the homogeneous model.</p>
collection DOAJ
language English
format Article
sources DOAJ
author Wang Dianhui
Lee Nung
spellingShingle Wang Dianhui
Lee Nung
SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model
BMC Bioinformatics
author_facet Wang Dianhui
Lee Nung
author_sort Wang Dianhui
title SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model
title_short SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model
title_full SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model
title_fullStr SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model
title_full_unstemmed SOMEA: self-organizing map based extraction algorithm for DNA motif identification with heterogeneous model
title_sort somea: self-organizing map based extraction algorithm for dna motif identification with heterogeneous model
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2011-02-01
description <p>Abstract</p> <p>Background</p> <p>Discrimination of transcription factor binding sites (TFBS) from background sequences plays a key role in computational motif discovery. Current clustering based algorithms employ homogeneous model for problem solving, which assumes that motifs and background signals can be equivalently characterized. This assumption has some limitations because both sequence signals have distinct properties.</p> <p>Results</p> <p>This paper aims to develop a Self-Organizing Map (SOM) based clustering algorithm for extracting binding sites in DNA sequences. Our framework is based on a novel intra-node soft competitive procedure to achieve maximum discrimination of motifs from background signals in datasets. The intra-node competition is based on an adaptive weighting technique on two different signal models to better represent these two classes of signals. Using several real and artificial datasets, we compared our proposed method with several motif discovery tools. Compared to SOMBRERO, a state-of-the-art SOM based motif discovery tool, it is found that our algorithm can achieve significant improvements in the average precision rates (i.e., about 27%) on the real datasets without compromising its sensitivity. Our method also performed favourably comparing against other motif discovery tools.</p> <p>Conclusions</p> <p>Motif discovery with model based clustering framework should consider the use of heterogeneous model to represent the two classes of signals in DNA sequences. Such heterogeneous model can achieve better signal discrimination compared to the homogeneous model.</p>
work_keys_str_mv AT wangdianhui someaselforganizingmapbasedextractionalgorithmfordnamotifidentificationwithheterogeneousmodel
AT leenung someaselforganizingmapbasedextractionalgorithmfordnamotifidentificationwithheterogeneousmodel
_version_ 1725067850918395904