Selecting additional tag SNPs for tolerating missing data in genotyping
<p>Abstract</p> <p>Background</p> <p>Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype pattern...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2005-11-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/6/263 |
id |
doaj-be2ed05b51774a999ad6f7e253275d1f |
---|---|
record_format |
Article |
spelling |
doaj-be2ed05b51774a999ad6f7e253275d1f2020-11-24T21:36:34ZengBMCBMC Bioinformatics1471-21052005-11-016126310.1186/1471-2105-6-263Selecting additional tag SNPs for tolerating missing data in genotypingChen TingZhang KuiHuang Yao-TingChao Kun-Mao<p>Abstract</p> <p>Background</p> <p>Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. In reality, some tag SNPs may be missing, and we may fail to distinguish two distinct haplotypes due to the ambiguity caused by missing data.</p> <p>Results</p> <p>We show there exists a subset of SNPs (referred to as robust tag SNPs) which can still distinguish all distinct haplotypes even when some SNPs are missing. The problem of finding minimum robust tag SNPs is shown to be NP-hard. To find robust tag SNPs efficiently, we propose two greedy algorithms and one linear programming relaxation algorithm. The experimental results indicate that (1) the solutions found by these algorithms are quite close to the optimal solution; (2) the genotyping cost saved by using tag SNPs can be as high as 80%; and (3) genotyping additional tag SNPs for tolerating missing data is still cost-effective.</p> <p>Conclusion</p> <p>Genotyping robust tag SNPs is more practical than just genotyping the minimum tag SNPs if we can not avoid the occurrence of missing data. Our theoretical analysis and experimental results show that the performance of our algorithms is not only efficient but the solution found is also close to the optimal solution.</p> http://www.biomedcentral.com/1471-2105/6/263 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Chen Ting Zhang Kui Huang Yao-Ting Chao Kun-Mao |
spellingShingle |
Chen Ting Zhang Kui Huang Yao-Ting Chao Kun-Mao Selecting additional tag SNPs for tolerating missing data in genotyping BMC Bioinformatics |
author_facet |
Chen Ting Zhang Kui Huang Yao-Ting Chao Kun-Mao |
author_sort |
Chen Ting |
title |
Selecting additional tag SNPs for tolerating missing data in genotyping |
title_short |
Selecting additional tag SNPs for tolerating missing data in genotyping |
title_full |
Selecting additional tag SNPs for tolerating missing data in genotyping |
title_fullStr |
Selecting additional tag SNPs for tolerating missing data in genotyping |
title_full_unstemmed |
Selecting additional tag SNPs for tolerating missing data in genotyping |
title_sort |
selecting additional tag snps for tolerating missing data in genotyping |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2005-11-01 |
description |
<p>Abstract</p> <p>Background</p> <p>Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. In reality, some tag SNPs may be missing, and we may fail to distinguish two distinct haplotypes due to the ambiguity caused by missing data.</p> <p>Results</p> <p>We show there exists a subset of SNPs (referred to as robust tag SNPs) which can still distinguish all distinct haplotypes even when some SNPs are missing. The problem of finding minimum robust tag SNPs is shown to be NP-hard. To find robust tag SNPs efficiently, we propose two greedy algorithms and one linear programming relaxation algorithm. The experimental results indicate that (1) the solutions found by these algorithms are quite close to the optimal solution; (2) the genotyping cost saved by using tag SNPs can be as high as 80%; and (3) genotyping additional tag SNPs for tolerating missing data is still cost-effective.</p> <p>Conclusion</p> <p>Genotyping robust tag SNPs is more practical than just genotyping the minimum tag SNPs if we can not avoid the occurrence of missing data. Our theoretical analysis and experimental results show that the performance of our algorithms is not only efficient but the solution found is also close to the optimal solution.</p> |
url |
http://www.biomedcentral.com/1471-2105/6/263 |
work_keys_str_mv |
AT chenting selectingadditionaltagsnpsfortoleratingmissingdataingenotyping AT zhangkui selectingadditionaltagsnpsfortoleratingmissingdataingenotyping AT huangyaoting selectingadditionaltagsnpsfortoleratingmissingdataingenotyping AT chaokunmao selectingadditionaltagsnpsfortoleratingmissingdataingenotyping |
_version_ |
1725940637092544512 |