Semicovariance Coefficient Analysis of Spike Proteins from SARS-CoV-2 and Other Coronaviruses for Viral Evolution and Characteristics Associated with Fatality

Complex modeling has received significant attention in recent years and is increasingly used to explain statistical phenomena with increasing and decreasing fluctuations, such as the similarity or difference of spike protein charge patterns of coronaviruses. Different from the existing covariance or...

Full description

Bibliographic Details
Main Authors: Jun Steed Huang, Jiamin Moran Huang, Wandong Zhang
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/23/5/512
id doaj-9fa6a1c5961045df9176d3f4a7eefc73
record_format Article
spelling doaj-9fa6a1c5961045df9176d3f4a7eefc732021-04-23T23:03:31ZengMDPI AGEntropy1099-43002021-04-012351251210.3390/e23050512Semicovariance Coefficient Analysis of Spike Proteins from SARS-CoV-2 and Other Coronaviruses for Viral Evolution and Characteristics Associated with FatalityJun Steed Huang0Jiamin Moran Huang1Wandong Zhang2School of Information Technology, Carleton University, Ottawa, ON K1S 5B6, CanadaDepartment of Computer Science, Jiangsu University, Suqian 223800, ChinaHuman Health Therapeutics Research Centre, National Research Council of Canada, 1200 Montreal Road, Building M54, Ottawa, ON K1A 0R6, CanadaComplex modeling has received significant attention in recent years and is increasingly used to explain statistical phenomena with increasing and decreasing fluctuations, such as the similarity or difference of spike protein charge patterns of coronaviruses. Different from the existing covariance or correlation coefficient methods in traditional integer dimension construction, this study proposes a simplified novel fractional dimension derivation with the exact Excel tool algorithm. It involves the fractional center moment extension to covariance, which results in a complex covariance coefficient that is better than the Pearson correlation coefficient, in the sense that the nonlinearity relationship can be further depicted. The spike protein sequences of coronaviruses were obtained from the GenBank and GISAID databases, including the coronaviruses from pangolin, bat, canine, swine (three variants), feline, tiger, SARS-CoV-1, MERS, and SARS-CoV-2 (including the strains from Wuhan, Beijing, New York, German, and the UK variant B.1.1.7) which were used as the representative examples in this study. By examining the values above and below the average/mean based on the positive and negative charge patterns of the amino acid residues of the spike proteins from coronaviruses, the proposed algorithm provides deep insights into the nonlinear evolving trends of spike proteins for understanding the viral evolution and identifying the protein characteristics associated with viral fatality. The calculation results demonstrate that the complex covariance coefficient analyzed by this algorithm is capable of distinguishing the subtle nonlinear differences in the spike protein charge patterns with reference to Wuhan strain SARS-CoV-2, which the Pearson correlation coefficient may overlook. Our analysis reveals the unique convergent (positive correlative) to divergent (negative correlative) domain center positions of each virus. The convergent or conserved region may be critical to the viral stability or viability; while the divergent region is highly variable between coronaviruses, suggesting high frequency of mutations in this region. The analyses show that the conserved center region of SARS-CoV-1 spike protein is located at amino acid residues 900, but shifted to the amino acid residues 700 in MERS spike protein, and then to amino acid residues 600 in SARS-COV-2 spike protein, indicating the evolution of the coronaviruses. Interestingly, the conserved center region of the spike protein in SARS-COV-2 variant B.1.1.7 shifted back to amino acid residues 700, suggesting this variant is more virulent than the original SARS-COV-2 strain. Another important characteristic our study reveals is that the distance between the divergent mean and the maximal divergent point in each of the viruses (MERS > SARS-CoV-1 > SARS-CoV-2) is proportional to viral fatality rate. This algorithm may help to understand and analyze the evolving trends and critical characteristics of SARS-COV-2 variants, other coronaviral proteins and viruses.https://www.mdpi.com/1099-4300/23/5/512fractional complex momentSARS-CoV-2coronavirusesspike protein sequencepearson correlation coefficientsemicovariance coefficient
collection DOAJ
language English
format Article
sources DOAJ
author Jun Steed Huang
Jiamin Moran Huang
Wandong Zhang
spellingShingle Jun Steed Huang
Jiamin Moran Huang
Wandong Zhang
Semicovariance Coefficient Analysis of Spike Proteins from SARS-CoV-2 and Other Coronaviruses for Viral Evolution and Characteristics Associated with Fatality
Entropy
fractional complex moment
SARS-CoV-2
coronaviruses
spike protein sequence
pearson correlation coefficient
semicovariance coefficient
author_facet Jun Steed Huang
Jiamin Moran Huang
Wandong Zhang
author_sort Jun Steed Huang
title Semicovariance Coefficient Analysis of Spike Proteins from SARS-CoV-2 and Other Coronaviruses for Viral Evolution and Characteristics Associated with Fatality
title_short Semicovariance Coefficient Analysis of Spike Proteins from SARS-CoV-2 and Other Coronaviruses for Viral Evolution and Characteristics Associated with Fatality
title_full Semicovariance Coefficient Analysis of Spike Proteins from SARS-CoV-2 and Other Coronaviruses for Viral Evolution and Characteristics Associated with Fatality
title_fullStr Semicovariance Coefficient Analysis of Spike Proteins from SARS-CoV-2 and Other Coronaviruses for Viral Evolution and Characteristics Associated with Fatality
title_full_unstemmed Semicovariance Coefficient Analysis of Spike Proteins from SARS-CoV-2 and Other Coronaviruses for Viral Evolution and Characteristics Associated with Fatality
title_sort semicovariance coefficient analysis of spike proteins from sars-cov-2 and other coronaviruses for viral evolution and characteristics associated with fatality
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2021-04-01
description Complex modeling has received significant attention in recent years and is increasingly used to explain statistical phenomena with increasing and decreasing fluctuations, such as the similarity or difference of spike protein charge patterns of coronaviruses. Different from the existing covariance or correlation coefficient methods in traditional integer dimension construction, this study proposes a simplified novel fractional dimension derivation with the exact Excel tool algorithm. It involves the fractional center moment extension to covariance, which results in a complex covariance coefficient that is better than the Pearson correlation coefficient, in the sense that the nonlinearity relationship can be further depicted. The spike protein sequences of coronaviruses were obtained from the GenBank and GISAID databases, including the coronaviruses from pangolin, bat, canine, swine (three variants), feline, tiger, SARS-CoV-1, MERS, and SARS-CoV-2 (including the strains from Wuhan, Beijing, New York, German, and the UK variant B.1.1.7) which were used as the representative examples in this study. By examining the values above and below the average/mean based on the positive and negative charge patterns of the amino acid residues of the spike proteins from coronaviruses, the proposed algorithm provides deep insights into the nonlinear evolving trends of spike proteins for understanding the viral evolution and identifying the protein characteristics associated with viral fatality. The calculation results demonstrate that the complex covariance coefficient analyzed by this algorithm is capable of distinguishing the subtle nonlinear differences in the spike protein charge patterns with reference to Wuhan strain SARS-CoV-2, which the Pearson correlation coefficient may overlook. Our analysis reveals the unique convergent (positive correlative) to divergent (negative correlative) domain center positions of each virus. The convergent or conserved region may be critical to the viral stability or viability; while the divergent region is highly variable between coronaviruses, suggesting high frequency of mutations in this region. The analyses show that the conserved center region of SARS-CoV-1 spike protein is located at amino acid residues 900, but shifted to the amino acid residues 700 in MERS spike protein, and then to amino acid residues 600 in SARS-COV-2 spike protein, indicating the evolution of the coronaviruses. Interestingly, the conserved center region of the spike protein in SARS-COV-2 variant B.1.1.7 shifted back to amino acid residues 700, suggesting this variant is more virulent than the original SARS-COV-2 strain. Another important characteristic our study reveals is that the distance between the divergent mean and the maximal divergent point in each of the viruses (MERS > SARS-CoV-1 > SARS-CoV-2) is proportional to viral fatality rate. This algorithm may help to understand and analyze the evolving trends and critical characteristics of SARS-COV-2 variants, other coronaviral proteins and viruses.
topic fractional complex moment
SARS-CoV-2
coronaviruses
spike protein sequence
pearson correlation coefficient
semicovariance coefficient
url https://www.mdpi.com/1099-4300/23/5/512
work_keys_str_mv AT junsteedhuang semicovariancecoefficientanalysisofspikeproteinsfromsarscov2andothercoronavirusesforviralevolutionandcharacteristicsassociatedwithfatality
AT jiaminmoranhuang semicovariancecoefficientanalysisofspikeproteinsfromsarscov2andothercoronavirusesforviralevolutionandcharacteristicsassociatedwithfatality
AT wandongzhang semicovariancecoefficientanalysisofspikeproteinsfromsarscov2andothercoronavirusesforviralevolutionandcharacteristicsassociatedwithfatality
_version_ 1721512196197842944