Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring

Abstract Background Since the transfer and application of modern sequencing technologies to the analysis of amplified fragment-length polymorphisms (AFLP), evolutionary biologists have included an increasing number of samples and markers in their studie...

Full description

Bibliographic Details
Main Authors:	Gerdes Tommy, Ehrich Dorothee, Tuszynski Jarek W, Arrigo Nils, Alvarez Nadir
Format:	Article
Language:	English
Published:	BMC 2009-01-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/10/33

id	doaj-e08835309a4d481ca3215f6144e587cf
record_format	Article
spelling	doaj-e08835309a4d481ca3215f6144e587cf2020-11-25T02:27:12ZengBMCBMC Bioinformatics1471-21052009-01-011013310.1186/1471-2105-10-33Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoringGerdes TommyEhrich DorotheeTuszynski Jarek WArrigo NilsAlvarez Nadir<p>Abstract</p> <p>Background</p> <p>Since the transfer and application of modern sequencing technologies to the analysis of amplified fragment-length polymorphisms (AFLP), evolutionary biologists have included an increasing number of samples and markers in their studies. Although justified in this context, the use of automated scoring procedures may result in technical biases that weaken the power and reliability of further analyses.</p> <p>Results</p> <p>Using a new scoring algorithm, RawGeno, we show that scoring errors – in particular "bin oversplitting" (i.e. when variant sizes of the same AFLP marker are not considered as homologous) and "technical homoplasy" (i.e. when two AFLP markers that differ slightly in size are mistakenly considered as being homologous) – induce a loss of discriminatory power, decrease the robustness of results and, in extreme cases, introduce erroneous information in genetic structure analyses. In the present study, we evaluate several descriptive statistics that can be used to optimize the scoring of the AFLP analysis, and we describe a new statistic, the information content per bin (I<sub>bin</sub>) that represents a valuable estimator during the optimization process. This statistic can be computed at any stage of the AFLP analysis without requiring the inclusion of replicated samples. Finally, we show that downstream analyses are not equally sensitive to scoring errors. Indeed, although a reasonable amount of flexibility is allowed during the optimization of the scoring procedure without causing considerable changes in the detection of genetic structure patterns, notable discrepancies are observed when estimating genetic diversities from differently scored datasets.</p> <p>Conclusion</p> <p>Our algorithm appears to perform as well as a commercial program in automating AFLP scoring, at least in the context of population genetics or phylogeographic studies. To our knowledge, RawGeno is the only freely available public-domain software for fully automated AFLP scoring, from electropherogram files to user-defined working binary matrices. RawGeno was implemented in an R CRAN package (with an user-friendly GUI) and can be found at <url>http://sourceforge.net/projects/rawgeno</url>.</p> http://www.biomedcentral.com/1471-2105/10/33
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Gerdes Tommy Ehrich Dorothee Tuszynski Jarek W Arrigo Nils Alvarez Nadir
spellingShingle	Gerdes Tommy Ehrich Dorothee Tuszynski Jarek W Arrigo Nils Alvarez Nadir Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring BMC Bioinformatics
author_facet	Gerdes Tommy Ehrich Dorothee Tuszynski Jarek W Arrigo Nils Alvarez Nadir
author_sort	Gerdes Tommy
title	Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title_short	Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title_full	Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title_fullStr	Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title_full_unstemmed	Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring
title_sort	evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using rawgeno, an r package for automating aflp scoring
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2009-01-01
description	<p>Abstract</p> <p>Background</p> <p>Since the transfer and application of modern sequencing technologies to the analysis of amplified fragment-length polymorphisms (AFLP), evolutionary biologists have included an increasing number of samples and markers in their studies. Although justified in this context, the use of automated scoring procedures may result in technical biases that weaken the power and reliability of further analyses.</p> <p>Results</p> <p>Using a new scoring algorithm, RawGeno, we show that scoring errors – in particular "bin oversplitting" (i.e. when variant sizes of the same AFLP marker are not considered as homologous) and "technical homoplasy" (i.e. when two AFLP markers that differ slightly in size are mistakenly considered as being homologous) – induce a loss of discriminatory power, decrease the robustness of results and, in extreme cases, introduce erroneous information in genetic structure analyses. In the present study, we evaluate several descriptive statistics that can be used to optimize the scoring of the AFLP analysis, and we describe a new statistic, the information content per bin (I<sub>bin</sub>) that represents a valuable estimator during the optimization process. This statistic can be computed at any stage of the AFLP analysis without requiring the inclusion of replicated samples. Finally, we show that downstream analyses are not equally sensitive to scoring errors. Indeed, although a reasonable amount of flexibility is allowed during the optimization of the scoring procedure without causing considerable changes in the detection of genetic structure patterns, notable discrepancies are observed when estimating genetic diversities from differently scored datasets.</p> <p>Conclusion</p> <p>Our algorithm appears to perform as well as a commercial program in automating AFLP scoring, at least in the context of population genetics or phylogeographic studies. To our knowledge, RawGeno is the only freely available public-domain software for fully automated AFLP scoring, from electropherogram files to user-defined working binary matrices. RawGeno was implemented in an R CRAN package (with an user-friendly GUI) and can be found at <url>http://sourceforge.net/projects/rawgeno</url>.</p>
url	http://www.biomedcentral.com/1471-2105/10/33
work_keys_str_mv	AT gerdestommy evaluatingtheimpactofscoringparametersonthestructureofintraspecificgeneticvariationusingrawgenoanrpackageforautomatingaflpscoring AT ehrichdorothee evaluatingtheimpactofscoringparametersonthestructureofintraspecificgeneticvariationusingrawgenoanrpackageforautomatingaflpscoring AT tuszynskijarekw evaluatingtheimpactofscoringparametersonthestructureofintraspecificgeneticvariationusingrawgenoanrpackageforautomatingaflpscoring AT arrigonils evaluatingtheimpactofscoringparametersonthestructureofintraspecificgeneticvariationusingrawgenoanrpackageforautomatingaflpscoring AT alvareznadir evaluatingtheimpactofscoringparametersonthestructureofintraspecificgeneticvariationusingrawgenoanrpackageforautomatingaflpscoring
_version_	1724843602712985600

Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring

Similar Items