Identifying Pathogenic Amino Acid Substitutions in Human Proteins Using Deep Learning

Many diseases of genetic origin originate from non-synonymous single nucleotide polymorphisms (nsSNPs). These cause changes in the final protein product encoded by a gene. Through large scale sequencing and population studies, there is growing availability of information of which variations are tole...

Full description

Bibliographic Details
Main Author: Kvist, Alexander
Format: Others
Language:English
Published: KTH, Skolan för kemi, bioteknologi och hälsa (CBH) 2018
Subjects:
SNP
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233513
Description
Summary:Many diseases of genetic origin originate from non-synonymous single nucleotide polymorphisms (nsSNPs). These cause changes in the final protein product encoded by a gene. Through large scale sequencing and population studies, there is growing availability of information of which variations are tolerated and which are not. Variant effect predictors use a wide range of information about such variations to predict their effect, often focusing on evolutionary information. Here, a novel amino acid substitution variant effect predictor is developed. The predictor is a deep convolutional neural network incorporating evolutionary information, sequence information, as well as structural information, to predict both the pathogenicity as well as the severity of amino acid substitutions. The model achieves state-of-the-art performance on benchmark datasets.