Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns.

Non-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap...

Full description

Bibliographic Details
Main Authors: Matthew Huska, Martin Vingron
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-12-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC5161304?pdf=render
id doaj-9a2f72efbc1743e5aca3f55b2b42d82c
record_format Article
spelling doaj-9a2f72efbc1743e5aca3f55b2b42d82c2020-11-25T02:12:16ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582016-12-011212e100524910.1371/journal.pcbi.1005249Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns.Matthew HuskaMartin VingronNon-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap with classically defined CpG islands which are computationally predicted using simple DNA sequence features. This is especially true in cold-blooded vertebrates such as Danio rerio (zebrafish). In order to investigate how predictive DNA sequence is of a region's methylation status, we applied a supervised learning approach using a spectrum kernel support vector machine, to see if a more complex model and supervised learning can be used to improve non-methylated island prediction and to understand the sequence properties of these regions. We demonstrate that DNA sequence is highly predictive of methylation status, and that in contrast to existing CpG island prediction methods our method is able to provide more useful predictions of NMIs genome-wide in all vertebrate organisms that were studied. Our results also show that in cold-blooded vertebrates (Anolis carolinensis, Xenopus tropicalis and Danio rerio) where genome-wide classical CpG island predictions consist primarily of false positives, longer primarily AT-rich DNA sequence features are able to identify these regions much more accurately.http://europepmc.org/articles/PMC5161304?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Matthew Huska
Martin Vingron
spellingShingle Matthew Huska
Martin Vingron
Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns.
PLoS Computational Biology
author_facet Matthew Huska
Martin Vingron
author_sort Matthew Huska
title Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns.
title_short Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns.
title_full Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns.
title_fullStr Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns.
title_full_unstemmed Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns.
title_sort improved prediction of non-methylated islands in vertebrates highlights different characteristic sequence patterns.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2016-12-01
description Non-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap with classically defined CpG islands which are computationally predicted using simple DNA sequence features. This is especially true in cold-blooded vertebrates such as Danio rerio (zebrafish). In order to investigate how predictive DNA sequence is of a region's methylation status, we applied a supervised learning approach using a spectrum kernel support vector machine, to see if a more complex model and supervised learning can be used to improve non-methylated island prediction and to understand the sequence properties of these regions. We demonstrate that DNA sequence is highly predictive of methylation status, and that in contrast to existing CpG island prediction methods our method is able to provide more useful predictions of NMIs genome-wide in all vertebrate organisms that were studied. Our results also show that in cold-blooded vertebrates (Anolis carolinensis, Xenopus tropicalis and Danio rerio) where genome-wide classical CpG island predictions consist primarily of false positives, longer primarily AT-rich DNA sequence features are able to identify these regions much more accurately.
url http://europepmc.org/articles/PMC5161304?pdf=render
work_keys_str_mv AT matthewhuska improvedpredictionofnonmethylatedislandsinvertebrateshighlightsdifferentcharacteristicsequencepatterns
AT martinvingron improvedpredictionofnonmethylatedislandsinvertebrateshighlightsdifferentcharacteristicsequencepatterns
_version_ 1724910393706414080