Repeats and EST analysis for new organisms

<p>Abstract</p> <p>Background</p> <p>Repeat masking is an important step in the EST analysis pipeline. For new species, genomic knowledge is scarce and good repeat libraries are typically unavailable. In these cases it is common practice to mask against known repeats fr...

Full description

Bibliographic Details
Main Authors: Jonassen Inge, Malde Ketil
Format: Article
Language:English
Published: BMC 2008-01-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/9/23
id doaj-84b733f352194c58960c9c830c3ce8a7
record_format Article
spelling doaj-84b733f352194c58960c9c830c3ce8a72020-11-24T21:14:29ZengBMCBMC Genomics1471-21642008-01-01912310.1186/1471-2164-9-23Repeats and EST analysis for new organismsJonassen IngeMalde Ketil<p>Abstract</p> <p>Background</p> <p>Repeat masking is an important step in the EST analysis pipeline. For new species, genomic knowledge is scarce and good repeat libraries are typically unavailable. In these cases it is common practice to mask against known repeats from other species (i.e., model organisms). There are few studies that investigate the effectiveness of this approach, or attempt to evaluate the different methods for identifying and masking repeats.</p> <p>Results</p> <p>Using zebrafish and medaka as example organisms, we show that accurate repeat masking is an important factor for obtaining a high quality clustering. Furthermore, we show that masking with standard repeat libraries based on curated genomic information from other species has little or no positive effect on the quality of the resulting EST clustering. Library based repeat masking which often constitutes a computational bottleneck in the EST analysis pipeline can therefore be reduced to species specific repeat libraries, or perhaps eliminated entirely. In contrast, substantially improved results can be achived by applying a repeat library derived from a partial reference clustering (e.g., from mapping sequences against a partially sequenced genome).</p> <p>Conclusion</p> <p>Of the methods explored, we find that the best EST clustering is achieved after masking with repeat libraries that are species specific. In the absence of such libraries, library-less masking gives results superior to the current practice of using cross-species, genome-based libraries.</p> http://www.biomedcentral.com/1471-2164/9/23
collection DOAJ
language English
format Article
sources DOAJ
author Jonassen Inge
Malde Ketil
spellingShingle Jonassen Inge
Malde Ketil
Repeats and EST analysis for new organisms
BMC Genomics
author_facet Jonassen Inge
Malde Ketil
author_sort Jonassen Inge
title Repeats and EST analysis for new organisms
title_short Repeats and EST analysis for new organisms
title_full Repeats and EST analysis for new organisms
title_fullStr Repeats and EST analysis for new organisms
title_full_unstemmed Repeats and EST analysis for new organisms
title_sort repeats and est analysis for new organisms
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2008-01-01
description <p>Abstract</p> <p>Background</p> <p>Repeat masking is an important step in the EST analysis pipeline. For new species, genomic knowledge is scarce and good repeat libraries are typically unavailable. In these cases it is common practice to mask against known repeats from other species (i.e., model organisms). There are few studies that investigate the effectiveness of this approach, or attempt to evaluate the different methods for identifying and masking repeats.</p> <p>Results</p> <p>Using zebrafish and medaka as example organisms, we show that accurate repeat masking is an important factor for obtaining a high quality clustering. Furthermore, we show that masking with standard repeat libraries based on curated genomic information from other species has little or no positive effect on the quality of the resulting EST clustering. Library based repeat masking which often constitutes a computational bottleneck in the EST analysis pipeline can therefore be reduced to species specific repeat libraries, or perhaps eliminated entirely. In contrast, substantially improved results can be achived by applying a repeat library derived from a partial reference clustering (e.g., from mapping sequences against a partially sequenced genome).</p> <p>Conclusion</p> <p>Of the methods explored, we find that the best EST clustering is achieved after masking with repeat libraries that are species specific. In the absence of such libraries, library-less masking gives results superior to the current practice of using cross-species, genome-based libraries.</p>
url http://www.biomedcentral.com/1471-2164/9/23
work_keys_str_mv AT jonasseninge repeatsandestanalysisforneworganisms
AT maldeketil repeatsandestanalysisforneworganisms
_version_ 1716746999780343808