Two-Phase and Family-Based Designs for Next-Generation Sequencing Studies

The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing...

Full description

Bibliographic Details
Main Authors: Duncan C Thomas, Zhao eYang, Fan eYang
Format: Article
Language:English
Published: Frontiers Media S.A. 2013-12-01
Series:Frontiers in Genetics
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00276/full
id doaj-c51afb0480d74a4e931f320f6fbd684e
record_format Article
spelling doaj-c51afb0480d74a4e931f320f6fbd684e2020-11-25T00:52:42ZengFrontiers Media S.A.Frontiers in Genetics1664-80212013-12-01410.3389/fgene.2013.0027670308Two-Phase and Family-Based Designs for Next-Generation Sequencing StudiesDuncan C Thomas0Zhao eYang1Fan eYang2University of Southern CaliforniaUniversity of Southern CaliforniaUniversity of Southern CaliforniaThe cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes.Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy.While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00276/fullBreast NeoplasmsSequencingcolorectal cancertwo-phase sampling designrare variant associationfamily-based study
collection DOAJ
language English
format Article
sources DOAJ
author Duncan C Thomas
Zhao eYang
Fan eYang
spellingShingle Duncan C Thomas
Zhao eYang
Fan eYang
Two-Phase and Family-Based Designs for Next-Generation Sequencing Studies
Frontiers in Genetics
Breast Neoplasms
Sequencing
colorectal cancer
two-phase sampling design
rare variant association
family-based study
author_facet Duncan C Thomas
Zhao eYang
Fan eYang
author_sort Duncan C Thomas
title Two-Phase and Family-Based Designs for Next-Generation Sequencing Studies
title_short Two-Phase and Family-Based Designs for Next-Generation Sequencing Studies
title_full Two-Phase and Family-Based Designs for Next-Generation Sequencing Studies
title_fullStr Two-Phase and Family-Based Designs for Next-Generation Sequencing Studies
title_full_unstemmed Two-Phase and Family-Based Designs for Next-Generation Sequencing Studies
title_sort two-phase and family-based designs for next-generation sequencing studies
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2013-12-01
description The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes.Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy.While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.
topic Breast Neoplasms
Sequencing
colorectal cancer
two-phase sampling design
rare variant association
family-based study
url http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00276/full
work_keys_str_mv AT duncancthomas twophaseandfamilybaseddesignsfornextgenerationsequencingstudies
AT zhaoeyang twophaseandfamilybaseddesignsfornextgenerationsequencingstudies
AT faneyang twophaseandfamilybaseddesignsfornextgenerationsequencingstudies
_version_ 1725240796779642880