Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants.

High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of...

Full description

Bibliographic Details
Main Authors: Pornpat Athamanolap, Vishwa Parekh, Stephanie I Fraley, Vatsal Agarwal, Dong J Shin, Michael A Jacobs, Tza-Huei Wang, Samuel Yang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4183555?pdf=render
id doaj-d7649fa0c06545d5a0c9201004aa0fc9
record_format Article
spelling doaj-d7649fa0c06545d5a0c9201004aa0fc92020-11-25T01:55:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0199e10909410.1371/journal.pone.0109094Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants.Pornpat AthamanolapVishwa ParekhStephanie I FraleyVatsal AgarwalDong J ShinMichael A JacobsTza-Huei WangSamuel YangHigh resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.http://europepmc.org/articles/PMC4183555?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Pornpat Athamanolap
Vishwa Parekh
Stephanie I Fraley
Vatsal Agarwal
Dong J Shin
Michael A Jacobs
Tza-Huei Wang
Samuel Yang
spellingShingle Pornpat Athamanolap
Vishwa Parekh
Stephanie I Fraley
Vatsal Agarwal
Dong J Shin
Michael A Jacobs
Tza-Huei Wang
Samuel Yang
Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants.
PLoS ONE
author_facet Pornpat Athamanolap
Vishwa Parekh
Stephanie I Fraley
Vatsal Agarwal
Dong J Shin
Michael A Jacobs
Tza-Huei Wang
Samuel Yang
author_sort Pornpat Athamanolap
title Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants.
title_short Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants.
title_full Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants.
title_fullStr Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants.
title_full_unstemmed Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants.
title_sort trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2014-01-01
description High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.
url http://europepmc.org/articles/PMC4183555?pdf=render
work_keys_str_mv AT pornpatathamanolap trainablehighresolutionmeltcurvemachinelearningclassifierforlargescalereliablegenotypingofsequencevariants
AT vishwaparekh trainablehighresolutionmeltcurvemachinelearningclassifierforlargescalereliablegenotypingofsequencevariants
AT stephanieifraley trainablehighresolutionmeltcurvemachinelearningclassifierforlargescalereliablegenotypingofsequencevariants
AT vatsalagarwal trainablehighresolutionmeltcurvemachinelearningclassifierforlargescalereliablegenotypingofsequencevariants
AT dongjshin trainablehighresolutionmeltcurvemachinelearningclassifierforlargescalereliablegenotypingofsequencevariants
AT michaelajacobs trainablehighresolutionmeltcurvemachinelearningclassifierforlargescalereliablegenotypingofsequencevariants
AT tzahueiwang trainablehighresolutionmeltcurvemachinelearningclassifierforlargescalereliablegenotypingofsequencevariants
AT samuelyang trainablehighresolutionmeltcurvemachinelearningclassifierforlargescalereliablegenotypingofsequencevariants
_version_ 1724982816653967360