S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning.

Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest...

Full description

Bibliographic Details
Main Authors: Daniel R Schrider, Andrew D Kern
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-03-01
Series:PLoS Genetics
Online Access:http://europepmc.org/articles/PMC4792382?pdf=render
id doaj-18627fadc9da46929949fe673418e3a8
record_format Article
spelling doaj-18627fadc9da46929949fe673418e3a82020-11-24T21:41:59ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042016-03-01123e100592810.1371/journal.pgen.1005928S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning.Daniel R SchriderAndrew D KernDetecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover, we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus, even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally, we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample, and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.http://europepmc.org/articles/PMC4792382?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Daniel R Schrider
Andrew D Kern
spellingShingle Daniel R Schrider
Andrew D Kern
S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning.
PLoS Genetics
author_facet Daniel R Schrider
Andrew D Kern
author_sort Daniel R Schrider
title S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning.
title_short S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning.
title_full S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning.
title_fullStr S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning.
title_full_unstemmed S/HIC: Robust Identification of Soft and Hard Sweeps Using Machine Learning.
title_sort s/hic: robust identification of soft and hard sweeps using machine learning.
publisher Public Library of Science (PLoS)
series PLoS Genetics
issn 1553-7390
1553-7404
publishDate 2016-03-01
description Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover, we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus, even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally, we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample, and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.
url http://europepmc.org/articles/PMC4792382?pdf=render
work_keys_str_mv AT danielrschrider shicrobustidentificationofsoftandhardsweepsusingmachinelearning
AT andrewdkern shicrobustidentificationofsoftandhardsweepsusingmachinelearning
_version_ 1725919571704020992