Gene characteristics predicting missense, nonsense and frameshift mutations in tumor samples

Abstract Background Because driver mutations provide selective advantage to the mutant clone, they tend to occur at a higher frequency in tumor samples compared to selectively neutral (passenger) mutations. However, mutation frequency alone is insufficient to identify cancer genes because mutability...

Full description

Bibliographic Details
Main Authors: Ivan P. Gorlov, Claudio W. Pikielny, Hildreth R. Frost, Stephanie C. Her, Michael D. Cole, Samuel D. Strohbehn, David Wallace-Bradley, Marek Kimmel, Olga Y. Gorlova, Christopher I. Amos
Format: Article
Language:English
Published: BMC 2018-11-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2455-0
id doaj-c7fe5ced7f304ca98c74b3534ff89c04
record_format Article
spelling doaj-c7fe5ced7f304ca98c74b3534ff89c042020-11-25T01:15:08ZengBMCBMC Bioinformatics1471-21052018-11-0119111410.1186/s12859-018-2455-0Gene characteristics predicting missense, nonsense and frameshift mutations in tumor samplesIvan P. Gorlov0Claudio W. Pikielny1Hildreth R. Frost2Stephanie C. Her3Michael D. Cole4Samuel D. Strohbehn5David Wallace-Bradley6Marek Kimmel7Olga Y. Gorlova8Christopher I. Amos9The Geisel School of Medicine, Department of Biomedical Data Science, Dartmouth College, HB7936, One Medical Center Dr., Dartmouth-Hitchcock Medical CenterThe Geisel School of Medicine, Department of Biomedical Data Science, Dartmouth College, HB7936, One Medical Center Dr., Dartmouth-Hitchcock Medical CenterThe Geisel School of Medicine, Department of Biomedical Data Science, Dartmouth College, HB7936, One Medical Center Dr., Dartmouth-Hitchcock Medical CenterThe Geisel School of Medicine, Department of Biomedical Data Science, Dartmouth College, HB7936, One Medical Center Dr., Dartmouth-Hitchcock Medical CenterThe Geisel School of Medicine, Department of Biomedical Data Science, Dartmouth College, HB7936, One Medical Center Dr., Dartmouth-Hitchcock Medical CenterThe Geisel School of Medicine, Department of Biomedical Data Science, Dartmouth College, HB7936, One Medical Center Dr., Dartmouth-Hitchcock Medical CenterDepartment of Statistics, Rice UniversityDepartment of Statistics, Rice UniversityThe Geisel School of Medicine, Department of Biomedical Data Science, Dartmouth College, HB7936, One Medical Center Dr., Dartmouth-Hitchcock Medical CenterThe Geisel School of Medicine, Department of Biomedical Data Science, Dartmouth College, HB7936, One Medical Center Dr., Dartmouth-Hitchcock Medical CenterAbstract Background Because driver mutations provide selective advantage to the mutant clone, they tend to occur at a higher frequency in tumor samples compared to selectively neutral (passenger) mutations. However, mutation frequency alone is insufficient to identify cancer genes because mutability is influenced by many gene characteristics, such as size, nucleotide composition, etc. The goal of this study was to identify gene characteristics associated with the frequency of somatic mutations in the gene in tumor samples. Results We used data on somatic mutations detected by genome wide screens from the Catalog of Somatic Mutations in Cancer (COSMIC). Gene size, nucleotide composition, expression level of the gene, relative replication time in the cell cycle, level of evolutionary conservation and other gene characteristics (totaling 11) were used as predictors of the number of somatic mutations. We applied stepwise multiple linear regression to predict the number of mutations per gene. Because missense, nonsense, and frameshift mutations are associated with different sets of gene characteristics, they were modeled separately. Gene characteristics explain 88% of the variation in the number of missense, 40% of nonsense, and 23% of frameshift mutations. Comparisons of the observed and expected numbers of mutations identified genes with a higher than expected number of mutations– positive outliers. Many of these are known driver genes. A number of novel candidate driver genes was also identified. Conclusions By comparing the observed and predicted number of mutations in a gene, we have identified known cancer-associated genes as well as 111 novel cancer associated genes. We also showed that adding the number of silent mutations per gene reported by genome/exome wide screens across all cancer type (COSMIC data) as a predictor substantially exceeds predicting accuracy of the most popular cancer gene predicting tool - MutsigCV.http://link.springer.com/article/10.1186/s12859-018-2455-0Catalog of somatic mutations in CancerCOSMICSomatic mutationsMissenseNonsenseFrameshift mutations
collection DOAJ
language English
format Article
sources DOAJ
author Ivan P. Gorlov
Claudio W. Pikielny
Hildreth R. Frost
Stephanie C. Her
Michael D. Cole
Samuel D. Strohbehn
David Wallace-Bradley
Marek Kimmel
Olga Y. Gorlova
Christopher I. Amos
spellingShingle Ivan P. Gorlov
Claudio W. Pikielny
Hildreth R. Frost
Stephanie C. Her
Michael D. Cole
Samuel D. Strohbehn
David Wallace-Bradley
Marek Kimmel
Olga Y. Gorlova
Christopher I. Amos
Gene characteristics predicting missense, nonsense and frameshift mutations in tumor samples
BMC Bioinformatics
Catalog of somatic mutations in Cancer
COSMIC
Somatic mutations
Missense
Nonsense
Frameshift mutations
author_facet Ivan P. Gorlov
Claudio W. Pikielny
Hildreth R. Frost
Stephanie C. Her
Michael D. Cole
Samuel D. Strohbehn
David Wallace-Bradley
Marek Kimmel
Olga Y. Gorlova
Christopher I. Amos
author_sort Ivan P. Gorlov
title Gene characteristics predicting missense, nonsense and frameshift mutations in tumor samples
title_short Gene characteristics predicting missense, nonsense and frameshift mutations in tumor samples
title_full Gene characteristics predicting missense, nonsense and frameshift mutations in tumor samples
title_fullStr Gene characteristics predicting missense, nonsense and frameshift mutations in tumor samples
title_full_unstemmed Gene characteristics predicting missense, nonsense and frameshift mutations in tumor samples
title_sort gene characteristics predicting missense, nonsense and frameshift mutations in tumor samples
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2018-11-01
description Abstract Background Because driver mutations provide selective advantage to the mutant clone, they tend to occur at a higher frequency in tumor samples compared to selectively neutral (passenger) mutations. However, mutation frequency alone is insufficient to identify cancer genes because mutability is influenced by many gene characteristics, such as size, nucleotide composition, etc. The goal of this study was to identify gene characteristics associated with the frequency of somatic mutations in the gene in tumor samples. Results We used data on somatic mutations detected by genome wide screens from the Catalog of Somatic Mutations in Cancer (COSMIC). Gene size, nucleotide composition, expression level of the gene, relative replication time in the cell cycle, level of evolutionary conservation and other gene characteristics (totaling 11) were used as predictors of the number of somatic mutations. We applied stepwise multiple linear regression to predict the number of mutations per gene. Because missense, nonsense, and frameshift mutations are associated with different sets of gene characteristics, they were modeled separately. Gene characteristics explain 88% of the variation in the number of missense, 40% of nonsense, and 23% of frameshift mutations. Comparisons of the observed and expected numbers of mutations identified genes with a higher than expected number of mutations– positive outliers. Many of these are known driver genes. A number of novel candidate driver genes was also identified. Conclusions By comparing the observed and predicted number of mutations in a gene, we have identified known cancer-associated genes as well as 111 novel cancer associated genes. We also showed that adding the number of silent mutations per gene reported by genome/exome wide screens across all cancer type (COSMIC data) as a predictor substantially exceeds predicting accuracy of the most popular cancer gene predicting tool - MutsigCV.
topic Catalog of somatic mutations in Cancer
COSMIC
Somatic mutations
Missense
Nonsense
Frameshift mutations
url http://link.springer.com/article/10.1186/s12859-018-2455-0
work_keys_str_mv AT ivanpgorlov genecharacteristicspredictingmissensenonsenseandframeshiftmutationsintumorsamples
AT claudiowpikielny genecharacteristicspredictingmissensenonsenseandframeshiftmutationsintumorsamples
AT hildrethrfrost genecharacteristicspredictingmissensenonsenseandframeshiftmutationsintumorsamples
AT stephaniecher genecharacteristicspredictingmissensenonsenseandframeshiftmutationsintumorsamples
AT michaeldcole genecharacteristicspredictingmissensenonsenseandframeshiftmutationsintumorsamples
AT samueldstrohbehn genecharacteristicspredictingmissensenonsenseandframeshiftmutationsintumorsamples
AT davidwallacebradley genecharacteristicspredictingmissensenonsenseandframeshiftmutationsintumorsamples
AT marekkimmel genecharacteristicspredictingmissensenonsenseandframeshiftmutationsintumorsamples
AT olgaygorlova genecharacteristicspredictingmissensenonsenseandframeshiftmutationsintumorsamples
AT christopheriamos genecharacteristicspredictingmissensenonsenseandframeshiftmutationsintumorsamples
_version_ 1725154248075771904