Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS

BACKGROUND: In the sporadic form of amyotrophic lateral sclerosis (ALS), the pathogenicity of rare variants in the causative genes characterizing the familial form remains largely unknown. To predict the pathogenicity of such variants, in silico analysis is commonly used. In some ALS causative genes...

Full description

Bibliographic Details
Main Authors: Hatano, Y. (Author), Ishihara, T. (Author), Onodera, O. (Author)
Format: Article
Language:English
Published: NLM (Medline) 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 03321nam a2200229Ia 4500
001 10.1186-s12859-023-05338-5
008 230529s2023 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS 
260 0 |b NLM (Medline)  |c 2023 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-023-05338-5 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159718025&doi=10.1186%2fs12859-023-05338-5&partnerID=40&md5=340f08d434ae25e2d02924b143bf0e13 
520 3 |a BACKGROUND: In the sporadic form of amyotrophic lateral sclerosis (ALS), the pathogenicity of rare variants in the causative genes characterizing the familial form remains largely unknown. To predict the pathogenicity of such variants, in silico analysis is commonly used. In some ALS causative genes, the pathogenic variants are concentrated in specific regions, and the resulting alterations in protein structure are thought to significantly affect pathogenicity. However, existing methods have not taken this issue into account. To address this, we have developed a technique termed MOVA (method for evaluating the pathogenicity of missense variants using AlphaFold2), which applies positional information for structural variants predicted by AlphaFold2. Here we examined the utility of MOVA for analysis of several causative genes of ALS. METHODS: We analyzed variants of 12 ALS-related genes (TARDBP, FUS, SETX, TBK1, OPTN, SOD1, VCP, SQSTM1, ANG, UBQLN2, DCTN1, and CCNF) and classified them as pathogenic or neutral. For each gene, the features of the variants, consisting of their positions in the 3D structure predicted by AlphaFold2, pLDDT score, and BLOSUM62 were trained into a random forest and evaluated by the stratified fivefold cross validation method. We compared how accurately MOVA predicted mutant pathogenicity with other in silico prediction methods and evaluated the prediction accuracy at TARDBP and FUS hotspots. We also examined which of the MOVA features had the greatest impact on pathogenicity discrimination. RESULTS: MOVA yielded useful results (AUC ≥ 0.70) for TARDBP, FUS, SOD1, VCP, and UBQLN2 of 12 ALS causative genes. In addition, when comparing the prediction accuracy with other in silico prediction methods, MOVA obtained the best results among those compared for TARDBP, VCP, UBQLN2, and CCNF. MOVA demonstrated superior predictive accuracy for the pathogenicity of mutations at hotspots of TARDBP and FUS. Moreover, higher accuracy was achieved by combining MOVA with REVEL or CADD. Among the features of MOVA, the x, y, and z coordinates performed the best and were highly correlated with MOVA. CONCLUSIONS: MOVA is useful for predicting the virulence of rare variants in which they are concentrated at specific structural sites, and for use in combination with other prediction methods. © 2023. The Author(s). 
650 0 4 |a AlphaFold2 
650 0 4 |a Amyotrophic lateral sclerosis 
650 0 4 |a Missense variant 
650 0 4 |a MOVA 
650 0 4 |a Prediction tool 
700 1 0 |a Hatano, Y.  |e author 
700 1 0 |a Ishihara, T.  |e author 
700 1 0 |a Onodera, O.  |e author 
773 |t BMC bioinformatics