Gentle masking of low-complexity sequences improves homology search.

Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. T...

Full description

Bibliographic Details
Main Author: Martin C Frith
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2011-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3242753?pdf=render
id doaj-906f564ab49244dda6c38def8767712a
record_format Article
spelling doaj-906f564ab49244dda6c38def8767712a2020-11-25T02:09:18ZengPublic Library of Science (PLoS)PLoS ONE1932-62032011-01-01612e2881910.1371/journal.pone.0028819Gentle masking of low-complexity sequences improves homology search.Martin C FrithDetection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with "gentle" masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is min(0,S), where S is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to "harsh" masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search.http://europepmc.org/articles/PMC3242753?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Martin C Frith
spellingShingle Martin C Frith
Gentle masking of low-complexity sequences improves homology search.
PLoS ONE
author_facet Martin C Frith
author_sort Martin C Frith
title Gentle masking of low-complexity sequences improves homology search.
title_short Gentle masking of low-complexity sequences improves homology search.
title_full Gentle masking of low-complexity sequences improves homology search.
title_fullStr Gentle masking of low-complexity sequences improves homology search.
title_full_unstemmed Gentle masking of low-complexity sequences improves homology search.
title_sort gentle masking of low-complexity sequences improves homology search.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2011-01-01
description Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with "gentle" masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is min(0,S), where S is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to "harsh" masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search.
url http://europepmc.org/articles/PMC3242753?pdf=render
work_keys_str_mv AT martincfrith gentlemaskingoflowcomplexitysequencesimproveshomologysearch
_version_ 1724924642471182336