Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.

A number of machine learning-based predictors have been developed for identifying immunogenic T-cell epitopes based on major histocompatibility complex (MHC) class I and II binding affinities. Rationally selecting the most appropriate tool has been complicated by the evolving training data and machi...

Full description

Bibliographic Details
Main Authors: Weilong Zhao, Xinwei Sher
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-11-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC6224037?pdf=render
id doaj-1e407d1f72334307852d2172b98d0a7c
record_format Article
spelling doaj-1e407d1f72334307852d2172b98d0a7c2020-11-25T01:44:39ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-11-011411e100645710.1371/journal.pcbi.1006457Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.Weilong ZhaoXinwei SherA number of machine learning-based predictors have been developed for identifying immunogenic T-cell epitopes based on major histocompatibility complex (MHC) class I and II binding affinities. Rationally selecting the most appropriate tool has been complicated by the evolving training data and machine learning methods. Despite the recent advances made in generating high-quality MHC-eluted, naturally processed ligandome, the reliability of new predictors on these epitopes has yet to be evaluated. This study reports the latest benchmarking on an extensive set of MHC-binding predictors by using newly available, untested data of both synthetic and naturally processed epitopes. 32 human leukocyte antigen (HLA) class I and 24 HLA class II alleles are included in the blind test set. Artificial neural network (ANN)-based approaches demonstrated better performance than regression-based machine learning and structural modeling. Among the 18 predictors benchmarked, ANN-based mhcflurry and nn_align perform the best for MHC class I 9-mer and class II 15-mer predictions, respectively, on binding/non-binding classification (Area Under Curves = 0.911). NetMHCpan4 also demonstrated comparable predictive power. Our customization of mhcflurry to a pan-HLA predictor has achieved similar accuracy to NetMHCpan. The overall accuracy of these methods are comparable between 9-mer and 10-mer testing data. However, the top methods deliver low correlations between the predicted versus the experimental affinities for strong MHC binders. When used on naturally processed MHC-ligands, tools that have been trained on elution data (NetMHCpan4 and MixMHCpred) shows better accuracy than pure binding affinity predictor. The variability of false prediction rate is considerable among HLA types and datasets. Finally, structure-based predictor of Rosetta FlexPepDock is less optimal compared to the machine learning approaches. With our benchmarking of MHC-binding and MHC-elution predictors using a comprehensive metrics, a unbiased view for establishing best practice of T-cell epitope predictions is presented, facilitating future development of methods in immunogenomics.http://europepmc.org/articles/PMC6224037?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Weilong Zhao
Xinwei Sher
spellingShingle Weilong Zhao
Xinwei Sher
Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.
PLoS Computational Biology
author_facet Weilong Zhao
Xinwei Sher
author_sort Weilong Zhao
title Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.
title_short Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.
title_full Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.
title_fullStr Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.
title_full_unstemmed Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.
title_sort systematically benchmarking peptide-mhc binding predictors: from synthetic to naturally processed epitopes.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2018-11-01
description A number of machine learning-based predictors have been developed for identifying immunogenic T-cell epitopes based on major histocompatibility complex (MHC) class I and II binding affinities. Rationally selecting the most appropriate tool has been complicated by the evolving training data and machine learning methods. Despite the recent advances made in generating high-quality MHC-eluted, naturally processed ligandome, the reliability of new predictors on these epitopes has yet to be evaluated. This study reports the latest benchmarking on an extensive set of MHC-binding predictors by using newly available, untested data of both synthetic and naturally processed epitopes. 32 human leukocyte antigen (HLA) class I and 24 HLA class II alleles are included in the blind test set. Artificial neural network (ANN)-based approaches demonstrated better performance than regression-based machine learning and structural modeling. Among the 18 predictors benchmarked, ANN-based mhcflurry and nn_align perform the best for MHC class I 9-mer and class II 15-mer predictions, respectively, on binding/non-binding classification (Area Under Curves = 0.911). NetMHCpan4 also demonstrated comparable predictive power. Our customization of mhcflurry to a pan-HLA predictor has achieved similar accuracy to NetMHCpan. The overall accuracy of these methods are comparable between 9-mer and 10-mer testing data. However, the top methods deliver low correlations between the predicted versus the experimental affinities for strong MHC binders. When used on naturally processed MHC-ligands, tools that have been trained on elution data (NetMHCpan4 and MixMHCpred) shows better accuracy than pure binding affinity predictor. The variability of false prediction rate is considerable among HLA types and datasets. Finally, structure-based predictor of Rosetta FlexPepDock is less optimal compared to the machine learning approaches. With our benchmarking of MHC-binding and MHC-elution predictors using a comprehensive metrics, a unbiased view for establishing best practice of T-cell epitope predictions is presented, facilitating future development of methods in immunogenomics.
url http://europepmc.org/articles/PMC6224037?pdf=render
work_keys_str_mv AT weilongzhao systematicallybenchmarkingpeptidemhcbindingpredictorsfromsynthetictonaturallyprocessedepitopes
AT xinweisher systematicallybenchmarkingpeptidemhcbindingpredictorsfromsynthetictonaturallyprocessedepitopes
_version_ 1725027221514485760