NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer

Abstract Background The accurate screening of tumor genomic landscapes for somatic mutations using high-throughput sequencing involves a crucial step in precise clinical diagnosis and targeted therapy. However, the complex inherent features of cancer tissue, especially, tumor genetic intra-heterogen...

Full description

Bibliographic Details
Main Authors: Irantzu Anzar, Angelina Sverchkova, Richard Stratford, Trevor Clancy
Format: Article
Language:English
Published: BMC 2019-05-01
Series:BMC Medical Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12920-019-0508-5
id doaj-2ca0de2b9f6c430187e30e04f48eef1b
record_format Article
spelling doaj-2ca0de2b9f6c430187e30e04f48eef1b2021-04-02T14:10:28ZengBMCBMC Medical Genomics1755-87942019-05-0112111410.1186/s12920-019-0508-5NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancerIrantzu Anzar0Angelina Sverchkova1Richard Stratford2Trevor Clancy3OncoImmunity AS, Oslo Cancer ClusterOncoImmunity AS, Oslo Cancer ClusterOncoImmunity AS, Oslo Cancer ClusterOncoImmunity AS, Oslo Cancer ClusterAbstract Background The accurate screening of tumor genomic landscapes for somatic mutations using high-throughput sequencing involves a crucial step in precise clinical diagnosis and targeted therapy. However, the complex inherent features of cancer tissue, especially, tumor genetic intra-heterogeneity coupled with the problem of sequencing and alignment artifacts, makes somatic variant calling a challenging task. Current variant filtering strategies, such as rule-based filtering and consensus voting of different algorithms, have previously helped to increase specificity, although comes at the cost of sensitivity. Methods In light of this, we have developed the NeoMutate framework which incorporates 7 supervised machine learning (ML) algorithms to exploit the strengths of multiple variant callers, using a non-redundant set of biological and sequence features. We benchmarked NeoMutate by simulating more than 10,000 bona fide cancer-related mutations into three well-characterized Genome in a Bottle (GIAB) reference samples. Results A robust and exhaustive evaluation of NeoMutate’s performance based on 5-fold cross validation experiments, in addition to 3 independent tests, demonstrated a substantially improved variant detection accuracy compared to any of its individual composite variant callers and consensus calling of multiple tools. Conclusions We show here that integrating multiple tools in an ensemble ML layer optimizes somatic variant detection rates, leading to a potentially improved variant selection framework for the diagnosis and treatment of cancer.http://link.springer.com/article/10.1186/s12920-019-0508-5Somatic variant detectionMachine learningCancer genomicsPrecision medicine
collection DOAJ
language English
format Article
sources DOAJ
author Irantzu Anzar
Angelina Sverchkova
Richard Stratford
Trevor Clancy
spellingShingle Irantzu Anzar
Angelina Sverchkova
Richard Stratford
Trevor Clancy
NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer
BMC Medical Genomics
Somatic variant detection
Machine learning
Cancer genomics
Precision medicine
author_facet Irantzu Anzar
Angelina Sverchkova
Richard Stratford
Trevor Clancy
author_sort Irantzu Anzar
title NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer
title_short NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer
title_full NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer
title_fullStr NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer
title_full_unstemmed NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer
title_sort neomutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer
publisher BMC
series BMC Medical Genomics
issn 1755-8794
publishDate 2019-05-01
description Abstract Background The accurate screening of tumor genomic landscapes for somatic mutations using high-throughput sequencing involves a crucial step in precise clinical diagnosis and targeted therapy. However, the complex inherent features of cancer tissue, especially, tumor genetic intra-heterogeneity coupled with the problem of sequencing and alignment artifacts, makes somatic variant calling a challenging task. Current variant filtering strategies, such as rule-based filtering and consensus voting of different algorithms, have previously helped to increase specificity, although comes at the cost of sensitivity. Methods In light of this, we have developed the NeoMutate framework which incorporates 7 supervised machine learning (ML) algorithms to exploit the strengths of multiple variant callers, using a non-redundant set of biological and sequence features. We benchmarked NeoMutate by simulating more than 10,000 bona fide cancer-related mutations into three well-characterized Genome in a Bottle (GIAB) reference samples. Results A robust and exhaustive evaluation of NeoMutate’s performance based on 5-fold cross validation experiments, in addition to 3 independent tests, demonstrated a substantially improved variant detection accuracy compared to any of its individual composite variant callers and consensus calling of multiple tools. Conclusions We show here that integrating multiple tools in an ensemble ML layer optimizes somatic variant detection rates, leading to a potentially improved variant selection framework for the diagnosis and treatment of cancer.
topic Somatic variant detection
Machine learning
Cancer genomics
Precision medicine
url http://link.springer.com/article/10.1186/s12920-019-0508-5
work_keys_str_mv AT irantzuanzar neomutateanensemblemachinelearningframeworkforthepredictionofsomaticmutationsincancer
AT angelinasverchkova neomutateanensemblemachinelearningframeworkforthepredictionofsomaticmutationsincancer
AT richardstratford neomutateanensemblemachinelearningframeworkforthepredictionofsomaticmutationsincancer
AT trevorclancy neomutateanensemblemachinelearningframeworkforthepredictionofsomaticmutationsincancer
_version_ 1721562906584154112