Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data

Abstract Background Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. Results We in...

Full description

Bibliographic Details
Main Authors: Xinping Fan, Guanghao Luo, Yu S. Huang
Format: Article
Language:English
Published: BMC 2021-01-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-020-03924-5
id doaj-8e1c1d18a88145818f2ac4de7ca18f84
record_format Article
spelling doaj-8e1c1d18a88145818f2ac4de7ca18f842021-01-17T12:59:24ZengBMCBMC Bioinformatics1471-21052021-01-0122111810.1186/s12859-020-03924-5Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing dataXinping Fan0Guanghao Luo1Yu S. Huang2Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of SciencesSchool of Pharmaceutical Sciences, Jilin UniversityDrug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of SciencesAbstract Background Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. Results We introduce Accucopy, a method to infer total copy numbers (TCNs) and allele-specific copy numbers (ASCNs) from challenging low-purity and low-coverage tumor samples. Accucopy adopts many robust statistical techniques such as kernel smoothing of coverage differentiation information to discern signals from noise and combines ideas from time-series analysis and the signal-processing field to derive a range of estimates for the period in a histogram of coverage differentiation information. Statistical learning models such as the tiered Gaussian mixture model, the expectation–maximization algorithm, and sparse Bayesian learning were customized and built into the model. Accucopy is implemented in C++ /Rust, packaged in a docker image, and supports non-human samples, more at http://www.yfish.org/software/ . Conclusions We describe Accucopy, a method that can predict both TCNs and ASCNs from low-coverage low-purity tumor sequencing data. Through comparative analyses in both simulated and real-sequencing samples, we demonstrate that Accucopy is more accurate than Sclust, ABSOLUTE, and Sequenza.https://doi.org/10.1186/s12859-020-03924-5Cancer genomicsCopy number alterationsNext-generation sequencing
collection DOAJ
language English
format Article
sources DOAJ
author Xinping Fan
Guanghao Luo
Yu S. Huang
spellingShingle Xinping Fan
Guanghao Luo
Yu S. Huang
Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data
BMC Bioinformatics
Cancer genomics
Copy number alterations
Next-generation sequencing
author_facet Xinping Fan
Guanghao Luo
Yu S. Huang
author_sort Xinping Fan
title Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data
title_short Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data
title_full Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data
title_fullStr Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data
title_full_unstemmed Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data
title_sort accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-01-01
description Abstract Background Copy number alterations (CNAs), due to their large impact on the genome, have been an important contributing factor to oncogenesis and metastasis. Detecting genomic alterations from the shallow-sequencing data of a low-purity tumor sample remains a challenging task. Results We introduce Accucopy, a method to infer total copy numbers (TCNs) and allele-specific copy numbers (ASCNs) from challenging low-purity and low-coverage tumor samples. Accucopy adopts many robust statistical techniques such as kernel smoothing of coverage differentiation information to discern signals from noise and combines ideas from time-series analysis and the signal-processing field to derive a range of estimates for the period in a histogram of coverage differentiation information. Statistical learning models such as the tiered Gaussian mixture model, the expectation–maximization algorithm, and sparse Bayesian learning were customized and built into the model. Accucopy is implemented in C++ /Rust, packaged in a docker image, and supports non-human samples, more at http://www.yfish.org/software/ . Conclusions We describe Accucopy, a method that can predict both TCNs and ASCNs from low-coverage low-purity tumor sequencing data. Through comparative analyses in both simulated and real-sequencing samples, we demonstrate that Accucopy is more accurate than Sclust, ABSOLUTE, and Sequenza.
topic Cancer genomics
Copy number alterations
Next-generation sequencing
url https://doi.org/10.1186/s12859-020-03924-5
work_keys_str_mv AT xinpingfan accucopyaccurateandfastinferenceofallelespecificcopynumberalterationsfromlowcoveragelowpuritytumorsequencingdata
AT guanghaoluo accucopyaccurateandfastinferenceofallelespecificcopynumberalterationsfromlowcoveragelowpuritytumorsequencingdata
AT yushuang accucopyaccurateandfastinferenceofallelespecificcopynumberalterationsfromlowcoveragelowpuritytumorsequencingdata
_version_ 1724334064064790528