Gentle masking of low-complexity sequences improves homology search.
Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. T...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2011-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC3242753?pdf=render |
id |
doaj-906f564ab49244dda6c38def8767712a |
---|---|
record_format |
Article |
spelling |
doaj-906f564ab49244dda6c38def8767712a2020-11-25T02:09:18ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-01-01612e2881910.1371/journal.pone.0028819Gentle masking of low-complexity sequences improves homology search.Martin C FrithDetection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with "gentle" masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is min(0,S), where S is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to "harsh" masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search.http://europepmc.org/articles/PMC3242753?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Martin C Frith |
spellingShingle |
Martin C Frith Gentle masking of low-complexity sequences improves homology search. PLoS ONE |
author_facet |
Martin C Frith |
author_sort |
Martin C Frith |
title |
Gentle masking of low-complexity sequences improves homology search. |
title_short |
Gentle masking of low-complexity sequences improves homology search. |
title_full |
Gentle masking of low-complexity sequences improves homology search. |
title_fullStr |
Gentle masking of low-complexity sequences improves homology search. |
title_full_unstemmed |
Gentle masking of low-complexity sequences improves homology search. |
title_sort |
gentle masking of low-complexity sequences improves homology search. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2011-01-01 |
description |
Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with "gentle" masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is min(0,S), where S is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to "harsh" masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search. |
url |
http://europepmc.org/articles/PMC3242753?pdf=render |
work_keys_str_mv |
AT martincfrith gentlemaskingoflowcomplexitysequencesimproveshomologysearch |
_version_ |
1724924642471182336 |