Approximation properties of haplotype tagging

<p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs t...

Full description

Bibliographic Details
Main Authors: Dreiseitl Stephan, Vinterbo Staal A, Ohno-Machado Lucila
Format: Article
Language:English
Published: BMC 2006-01-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/8
id doaj-03662511bbee4fa693c4344e7f57186a
record_format Article
spelling doaj-03662511bbee4fa693c4344e7f57186a2020-11-25T02:45:26ZengBMCBMC Bioinformatics1471-21052006-01-0171810.1186/1471-2105-7-8Approximation properties of haplotype taggingDreiseitl StephanVinterbo Staal AOhno-Machado Lucila<p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs that allow the unique identification of such patterns. This process, known as haplotype tagging, is formulated as a combinatorial optimization problem and analyzed in terms of complexity and approximation properties.</p> <p>Results</p> <p>It is shown that the tagging problem is NP-hard but approximable within 1 + ln((<it>n</it><sup>2 </sup>- <it>n</it>)/2) for <it>n </it>haplotypes but not approximable within (1 - <it>ε</it>) ln(<it>n</it>/2) for any <it>ε </it>> 0 unless NP ⊂ DTIME(<it>n</it><sup>log log <it>n</it></sup>).</p> <p>A simple, very easily implementable algorithm that exhibits the above upper bound on solution quality is presented. This algorithm has running time <it>O</it>(<graphic file="1471-2105-7-8-i1.gif"/>(2<it>m </it>- <it>p </it>+ 1)) ≤ <it>O</it>(<it>m</it>(<it>n</it><sup>2 </sup>- <it>n</it>)/2) where <it>p </it>≤ min(<it>n</it>, <it>m</it>) for <it>n </it>haplotypes of size <it>m</it>. As we show that the approximation bound is asymptotically tight, the algorithm presented is optimal with respect to this asymptotic bound.</p> <p>Conclusion</p> <p>The haplotype tagging problem is hard, but approachable with a fast, practical, and surprisingly simple algorithm that cannot be significantly improved upon on a single processor machine. Hence, significant improvement in computatational efforts expended can only be expected if the computational effort is distributed and done in parallel.</p> http://www.biomedcentral.com/1471-2105/7/8
collection DOAJ
language English
format Article
sources DOAJ
author Dreiseitl Stephan
Vinterbo Staal A
Ohno-Machado Lucila
spellingShingle Dreiseitl Stephan
Vinterbo Staal A
Ohno-Machado Lucila
Approximation properties of haplotype tagging
BMC Bioinformatics
author_facet Dreiseitl Stephan
Vinterbo Staal A
Ohno-Machado Lucila
author_sort Dreiseitl Stephan
title Approximation properties of haplotype tagging
title_short Approximation properties of haplotype tagging
title_full Approximation properties of haplotype tagging
title_fullStr Approximation properties of haplotype tagging
title_full_unstemmed Approximation properties of haplotype tagging
title_sort approximation properties of haplotype tagging
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2006-01-01
description <p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs that allow the unique identification of such patterns. This process, known as haplotype tagging, is formulated as a combinatorial optimization problem and analyzed in terms of complexity and approximation properties.</p> <p>Results</p> <p>It is shown that the tagging problem is NP-hard but approximable within 1 + ln((<it>n</it><sup>2 </sup>- <it>n</it>)/2) for <it>n </it>haplotypes but not approximable within (1 - <it>ε</it>) ln(<it>n</it>/2) for any <it>ε </it>> 0 unless NP ⊂ DTIME(<it>n</it><sup>log log <it>n</it></sup>).</p> <p>A simple, very easily implementable algorithm that exhibits the above upper bound on solution quality is presented. This algorithm has running time <it>O</it>(<graphic file="1471-2105-7-8-i1.gif"/>(2<it>m </it>- <it>p </it>+ 1)) ≤ <it>O</it>(<it>m</it>(<it>n</it><sup>2 </sup>- <it>n</it>)/2) where <it>p </it>≤ min(<it>n</it>, <it>m</it>) for <it>n </it>haplotypes of size <it>m</it>. As we show that the approximation bound is asymptotically tight, the algorithm presented is optimal with respect to this asymptotic bound.</p> <p>Conclusion</p> <p>The haplotype tagging problem is hard, but approachable with a fast, practical, and surprisingly simple algorithm that cannot be significantly improved upon on a single processor machine. Hence, significant improvement in computatational efforts expended can only be expected if the computational effort is distributed and done in parallel.</p>
url http://www.biomedcentral.com/1471-2105/7/8
work_keys_str_mv AT dreiseitlstephan approximationpropertiesofhaplotypetagging
AT vinterbostaala approximationpropertiesofhaplotypetagging
AT ohnomachadolucila approximationpropertiesofhaplotypetagging
_version_ 1724762884603379712