Selecting additional tag SNPs for tolerating missing data in genotyping

<p>Abstract</p> <p>Background</p> <p>Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype pattern...

Full description

Bibliographic Details
Main Authors: Chen Ting, Zhang Kui, Huang Yao-Ting, Chao Kun-Mao
Format: Article
Language:English
Published: BMC 2005-11-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/6/263
id doaj-be2ed05b51774a999ad6f7e253275d1f
record_format Article
spelling doaj-be2ed05b51774a999ad6f7e253275d1f2020-11-24T21:36:34ZengBMCBMC Bioinformatics1471-21052005-11-016126310.1186/1471-2105-6-263Selecting additional tag SNPs for tolerating missing data in genotypingChen TingZhang KuiHuang Yao-TingChao Kun-Mao<p>Abstract</p> <p>Background</p> <p>Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. In reality, some tag SNPs may be missing, and we may fail to distinguish two distinct haplotypes due to the ambiguity caused by missing data.</p> <p>Results</p> <p>We show there exists a subset of SNPs (referred to as robust tag SNPs) which can still distinguish all distinct haplotypes even when some SNPs are missing. The problem of finding minimum robust tag SNPs is shown to be NP-hard. To find robust tag SNPs efficiently, we propose two greedy algorithms and one linear programming relaxation algorithm. The experimental results indicate that (1) the solutions found by these algorithms are quite close to the optimal solution; (2) the genotyping cost saved by using tag SNPs can be as high as 80%; and (3) genotyping additional tag SNPs for tolerating missing data is still cost-effective.</p> <p>Conclusion</p> <p>Genotyping robust tag SNPs is more practical than just genotyping the minimum tag SNPs if we can not avoid the occurrence of missing data. Our theoretical analysis and experimental results show that the performance of our algorithms is not only efficient but the solution found is also close to the optimal solution.</p> http://www.biomedcentral.com/1471-2105/6/263
collection DOAJ
language English
format Article
sources DOAJ
author Chen Ting
Zhang Kui
Huang Yao-Ting
Chao Kun-Mao
spellingShingle Chen Ting
Zhang Kui
Huang Yao-Ting
Chao Kun-Mao
Selecting additional tag SNPs for tolerating missing data in genotyping
BMC Bioinformatics
author_facet Chen Ting
Zhang Kui
Huang Yao-Ting
Chao Kun-Mao
author_sort Chen Ting
title Selecting additional tag SNPs for tolerating missing data in genotyping
title_short Selecting additional tag SNPs for tolerating missing data in genotyping
title_full Selecting additional tag SNPs for tolerating missing data in genotyping
title_fullStr Selecting additional tag SNPs for tolerating missing data in genotyping
title_full_unstemmed Selecting additional tag SNPs for tolerating missing data in genotyping
title_sort selecting additional tag snps for tolerating missing data in genotyping
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2005-11-01
description <p>Abstract</p> <p>Background</p> <p>Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. In reality, some tag SNPs may be missing, and we may fail to distinguish two distinct haplotypes due to the ambiguity caused by missing data.</p> <p>Results</p> <p>We show there exists a subset of SNPs (referred to as robust tag SNPs) which can still distinguish all distinct haplotypes even when some SNPs are missing. The problem of finding minimum robust tag SNPs is shown to be NP-hard. To find robust tag SNPs efficiently, we propose two greedy algorithms and one linear programming relaxation algorithm. The experimental results indicate that (1) the solutions found by these algorithms are quite close to the optimal solution; (2) the genotyping cost saved by using tag SNPs can be as high as 80%; and (3) genotyping additional tag SNPs for tolerating missing data is still cost-effective.</p> <p>Conclusion</p> <p>Genotyping robust tag SNPs is more practical than just genotyping the minimum tag SNPs if we can not avoid the occurrence of missing data. Our theoretical analysis and experimental results show that the performance of our algorithms is not only efficient but the solution found is also close to the optimal solution.</p>
url http://www.biomedcentral.com/1471-2105/6/263
work_keys_str_mv AT chenting selectingadditionaltagsnpsfortoleratingmissingdataingenotyping
AT zhangkui selectingadditionaltagsnpsfortoleratingmissingdataingenotyping
AT huangyaoting selectingadditionaltagsnpsfortoleratingmissingdataingenotyping
AT chaokunmao selectingadditionaltagsnpsfortoleratingmissingdataingenotyping
_version_ 1725940637092544512