A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data

Copy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications o...

Full description

Bibliographic Details
Main Authors: Tom Hill, Robert L. Unckless
Format: Article
Language:English
Published: Oxford University Press 2019-11-01
Series:G3: Genes, Genomes, Genetics
Subjects:
Online Access:http://g3journal.org/lookup/doi/10.1534/g3.119.400596
id doaj-ba44e60bc6b24ec187e15a04c00cefc2
record_format Article
spelling doaj-ba44e60bc6b24ec187e15a04c00cefc22021-07-02T12:26:05ZengOxford University PressG3: Genes, Genomes, Genetics2160-18362019-11-019113575358210.1534/g3.119.4005968A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing DataTom HillRobert L. UncklessCopy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications of machine learning in genomics, we describe a method to detect duplications and deletions in short-read sequencing data. In low coverage data, machine learning appears to be more powerful in the detection of CNVs than the gold-standard methods of coverage estimation alone, and of equal power in high coverage data. We also demonstrate how replicating training sets allows a more precise detection of CNVs, even identifying novel CNVs in two genomes previously surveyed thoroughly for CNVs using long read data.http://g3journal.org/lookup/doi/10.1534/g3.119.400596coveragedeletionduplicationmachine-learningnext-generation sequencing
collection DOAJ
language English
format Article
sources DOAJ
author Tom Hill
Robert L. Unckless
spellingShingle Tom Hill
Robert L. Unckless
A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
G3: Genes, Genomes, Genetics
coverage
deletion
duplication
machine-learning
next-generation sequencing
author_facet Tom Hill
Robert L. Unckless
author_sort Tom Hill
title A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title_short A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title_full A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title_fullStr A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title_full_unstemmed A Deep Learning Approach for Detecting Copy Number Variation in Next-Generation Sequencing Data
title_sort deep learning approach for detecting copy number variation in next-generation sequencing data
publisher Oxford University Press
series G3: Genes, Genomes, Genetics
issn 2160-1836
publishDate 2019-11-01
description Copy number variants (CNV) are associated with phenotypic variation in several species. However, properly detecting changes in copy numbers of sequences remains a difficult problem, especially in lower quality or lower coverage next-generation sequencing data. Here, inspired by recent applications of machine learning in genomics, we describe a method to detect duplications and deletions in short-read sequencing data. In low coverage data, machine learning appears to be more powerful in the detection of CNVs than the gold-standard methods of coverage estimation alone, and of equal power in high coverage data. We also demonstrate how replicating training sets allows a more precise detection of CNVs, even identifying novel CNVs in two genomes previously surveyed thoroughly for CNVs using long read data.
topic coverage
deletion
duplication
machine-learning
next-generation sequencing
url http://g3journal.org/lookup/doi/10.1534/g3.119.400596
work_keys_str_mv AT tomhill adeeplearningapproachfordetectingcopynumbervariationinnextgenerationsequencingdata
AT robertlunckless adeeplearningapproachfordetectingcopynumbervariationinnextgenerationsequencingdata
AT tomhill deeplearningapproachfordetectingcopynumbervariationinnextgenerationsequencingdata
AT robertlunckless deeplearningapproachfordetectingcopynumbervariationinnextgenerationsequencingdata
_version_ 1721330197092892672