Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?

Across independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to...

Full description

Bibliographic Details
Main Authors: Thomas C.A. Smith, Antony M. Carr, Adam C. Eyre-Walker
Format: Article
Language:English
Published: PeerJ Inc. 2016-09-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/2391.pdf
id doaj-fead253178594e1494f5676b1f8499c7
record_format Article
spelling doaj-fead253178594e1494f5676b1f8499c72020-11-24T20:54:22ZengPeerJ Inc.PeerJ2167-83592016-09-014e239110.7717/peerj.2391Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?Thomas C.A. Smith0Antony M. Carr1Adam C. Eyre-Walker2School of Life Sciences, University of Sussex, Brighton, East Sussex, United KingdomGenome Damage and Stability Centre, University of Sussex, Brighton, East Sussex, United KingdomSchool of Life Sciences, University of Sussex, Brighton, East Sussex, United KingdomAcross independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likely to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ∼4% of all SNVs are errors in this dataset, but that the rate of error varies by thousands-of-fold between sites.https://peerj.com/articles/2391.pdfCancerSomaticVariationMutationMutation rate variationSequencing error
collection DOAJ
language English
format Article
sources DOAJ
author Thomas C.A. Smith
Antony M. Carr
Adam C. Eyre-Walker
spellingShingle Thomas C.A. Smith
Antony M. Carr
Adam C. Eyre-Walker
Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?
PeerJ
Cancer
Somatic
Variation
Mutation
Mutation rate variation
Sequencing error
author_facet Thomas C.A. Smith
Antony M. Carr
Adam C. Eyre-Walker
author_sort Thomas C.A. Smith
title Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?
title_short Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?
title_full Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?
title_fullStr Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?
title_full_unstemmed Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?
title_sort are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2016-09-01
description Across independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likely to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ∼4% of all SNVs are errors in this dataset, but that the rate of error varies by thousands-of-fold between sites.
topic Cancer
Somatic
Variation
Mutation
Mutation rate variation
Sequencing error
url https://peerj.com/articles/2391.pdf
work_keys_str_mv AT thomascasmith aresiteswithmultiplesinglenucleotidevariantsincancergenomesaconsequenceofdrivershypermutablesitesorsequencingerrors
AT antonymcarr aresiteswithmultiplesinglenucleotidevariantsincancergenomesaconsequenceofdrivershypermutablesitesorsequencingerrors
AT adamceyrewalker aresiteswithmultiplesinglenucleotidevariantsincancergenomesaconsequenceofdrivershypermutablesitesorsequencingerrors
_version_ 1716794696074788865