R-Gada: a fast and flexible pipeline for copy number analysis in association studies
<p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variati...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2010-07-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/11/380 |
id |
doaj-d050b429be304fc1ab49f9018f44754f |
---|---|
record_format |
Article |
spelling |
doaj-d050b429be304fc1ab49f9018f44754f2020-11-24T21:33:42ZengBMCBMC Bioinformatics1471-21052010-07-0111138010.1186/1471-2105-11-380R-Gada: a fast and flexible pipeline for copy number analysis in association studiesCáceres AlejandroPique-Regi RogerGonzález Juan R<p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association.</p> <p>Results</p> <p>Here we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis.</p> <p>Conclusions</p> <p>The package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.</p> http://www.biomedcentral.com/1471-2105/11/380 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Cáceres Alejandro Pique-Regi Roger González Juan R |
spellingShingle |
Cáceres Alejandro Pique-Regi Roger González Juan R R-Gada: a fast and flexible pipeline for copy number analysis in association studies BMC Bioinformatics |
author_facet |
Cáceres Alejandro Pique-Regi Roger González Juan R |
author_sort |
Cáceres Alejandro |
title |
R-Gada: a fast and flexible pipeline for copy number analysis in association studies |
title_short |
R-Gada: a fast and flexible pipeline for copy number analysis in association studies |
title_full |
R-Gada: a fast and flexible pipeline for copy number analysis in association studies |
title_fullStr |
R-Gada: a fast and flexible pipeline for copy number analysis in association studies |
title_full_unstemmed |
R-Gada: a fast and flexible pipeline for copy number analysis in association studies |
title_sort |
r-gada: a fast and flexible pipeline for copy number analysis in association studies |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2010-07-01 |
description |
<p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association.</p> <p>Results</p> <p>Here we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis.</p> <p>Conclusions</p> <p>The package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.</p> |
url |
http://www.biomedcentral.com/1471-2105/11/380 |
work_keys_str_mv |
AT caceresalejandro rgadaafastandflexiblepipelineforcopynumberanalysisinassociationstudies AT piqueregiroger rgadaafastandflexiblepipelineforcopynumberanalysisinassociationstudies AT gonzalezjuanr rgadaafastandflexiblepipelineforcopynumberanalysisinassociationstudies |
_version_ |
1725952355988406272 |