Fast motif recognition via application of statistical thresholds
<p>Abstract</p> <p>Background</p> <p>Improving the accuracy and efficiency of motif recognition is an important computational challenge that has application to detecting transcription factor binding sites in genomic data. Closely related to motif recognition is the C<...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2010-01-01
|
Series: | BMC Bioinformatics |
id |
doaj-bcf5b2fee1114f39a6aa3dc5197299f5 |
---|---|
record_format |
Article |
spelling |
doaj-bcf5b2fee1114f39a6aa3dc5197299f52020-11-25T01:37:18ZengBMCBMC Bioinformatics1471-21052010-01-0111Suppl 1S1110.1186/1471-2105-11-S1-S11Fast motif recognition via application of statistical thresholdsKing JamesBoucher Christina<p>Abstract</p> <p>Background</p> <p>Improving the accuracy and efficiency of motif recognition is an important computational challenge that has application to detecting transcription factor binding sites in genomic data. Closely related to motif recognition is the C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> decision problem that asks, given a parameter <it>d </it>and a set of ℓ-length strings <it>S </it>= {<it>s</it><sub>1</sub>, ..., <it>s</it><sub><it>n</it></sub>}, whether there exists a consensus string that has Hamming distance at most <it>d </it>from any string in <it>S</it>. A set of strings <it>S </it>is <it>pairwise bounded </it>if the Hamming distance between any pair of strings in <it>S </it>is at most 2<it>d</it>. It is trivial to determine whether a set is pairwise bounded, and a set cannot have a consensus string unless it is pairwise bounded. We use C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> to determine whether or not a pairwise bounded set has a consensus. Unfortunately, C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> is NP-complete. The lack of an efficient method to solve the C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> problem has caused it to become a computational bottleneck in <it>MCL-WMR</it>, a motif recognition program capable of solving difficult motif recognition problem instances.</p> <p>Results</p> <p>We focus on the development of a method for solving C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> quickly with a small probability of error. We apply this heuristic to develop a new motif recognition program, <it>sMCL-WMR</it>, which has impressive accuracy and efficiency. We demonstrate the performance of <it>sMCL-WMR </it>in detecting weak motifs in large data sets and in real genomic data sets, and compare the performance to other leading motif recognition programs. In our preliminary discussion of our C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> algorithm we give insight into the issue of sampling pairwise bounded sets, and discuss its relevance to motif recognition.</p> <p>Conclusion</p> <p>Our novel heuristic gives birth to a state of the art program, <it>sMCL-WMR</it>, that is capable of detecting weak motifs in data sets with a large number of strings. <it>sMCL-WMR </it>is orders of magnitude faster than its predecessor <it>MCL-WMR </it>and is capable of solving previously unsolved synthetic motif recognition problems. Lastly, <it>sMCL-WMR </it>shows impressive accuracy in detecting transcription factor binding sites in the genomic data and used in the assessment of Tompa <it>et al</it>.</p> |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
King James Boucher Christina |
spellingShingle |
King James Boucher Christina Fast motif recognition via application of statistical thresholds BMC Bioinformatics |
author_facet |
King James Boucher Christina |
author_sort |
King James |
title |
Fast motif recognition via application of statistical thresholds |
title_short |
Fast motif recognition via application of statistical thresholds |
title_full |
Fast motif recognition via application of statistical thresholds |
title_fullStr |
Fast motif recognition via application of statistical thresholds |
title_full_unstemmed |
Fast motif recognition via application of statistical thresholds |
title_sort |
fast motif recognition via application of statistical thresholds |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2010-01-01 |
description |
<p>Abstract</p> <p>Background</p> <p>Improving the accuracy and efficiency of motif recognition is an important computational challenge that has application to detecting transcription factor binding sites in genomic data. Closely related to motif recognition is the C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> decision problem that asks, given a parameter <it>d </it>and a set of ℓ-length strings <it>S </it>= {<it>s</it><sub>1</sub>, ..., <it>s</it><sub><it>n</it></sub>}, whether there exists a consensus string that has Hamming distance at most <it>d </it>from any string in <it>S</it>. A set of strings <it>S </it>is <it>pairwise bounded </it>if the Hamming distance between any pair of strings in <it>S </it>is at most 2<it>d</it>. It is trivial to determine whether a set is pairwise bounded, and a set cannot have a consensus string unless it is pairwise bounded. We use C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> to determine whether or not a pairwise bounded set has a consensus. Unfortunately, C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> is NP-complete. The lack of an efficient method to solve the C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> problem has caused it to become a computational bottleneck in <it>MCL-WMR</it>, a motif recognition program capable of solving difficult motif recognition problem instances.</p> <p>Results</p> <p>We focus on the development of a method for solving C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> quickly with a small probability of error. We apply this heuristic to develop a new motif recognition program, <it>sMCL-WMR</it>, which has impressive accuracy and efficiency. We demonstrate the performance of <it>sMCL-WMR </it>in detecting weak motifs in large data sets and in real genomic data sets, and compare the performance to other leading motif recognition programs. In our preliminary discussion of our C<smcaps>ONSENSUS</smcaps> S<smcaps>TRING</smcaps> algorithm we give insight into the issue of sampling pairwise bounded sets, and discuss its relevance to motif recognition.</p> <p>Conclusion</p> <p>Our novel heuristic gives birth to a state of the art program, <it>sMCL-WMR</it>, that is capable of detecting weak motifs in data sets with a large number of strings. <it>sMCL-WMR </it>is orders of magnitude faster than its predecessor <it>MCL-WMR </it>and is capable of solving previously unsolved synthetic motif recognition problems. Lastly, <it>sMCL-WMR </it>shows impressive accuracy in detecting transcription factor binding sites in the genomic data and used in the assessment of Tompa <it>et al</it>.</p> |
work_keys_str_mv |
AT kingjames fastmotifrecognitionviaapplicationofstatisticalthresholds AT boucherchristina fastmotifrecognitionviaapplicationofstatisticalthresholds |
_version_ |
1725058413757464576 |