A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis

BackgroundThe severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already...

Full description

Bibliographic Details
Main Authors: Mavian, Carla, Marini, Simone, Prosperi, Mattia, Salemi, Marco
Format: Article
Language:English
Published: JMIR Publications 2020-06-01
Series:JMIR Public Health and Surveillance
Online Access:http://publichealth.jmir.org/2020/2/e19170/
id doaj-a70bfce838f5473f9694279e2dad748f
record_format Article
spelling doaj-a70bfce838f5473f9694279e2dad748f2021-05-02T19:28:48ZengJMIR PublicationsJMIR Public Health and Surveillance2369-29602020-06-0162e1917010.2196/19170A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data AnalysisMavian, CarlaMarini, SimoneProsperi, MattiaSalemi, Marco BackgroundThe severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. ObjectiveThe objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. MethodsWe used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. ResultsAlthough the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. ConclusionsAt the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.http://publichealth.jmir.org/2020/2/e19170/
collection DOAJ
language English
format Article
sources DOAJ
author Mavian, Carla
Marini, Simone
Prosperi, Mattia
Salemi, Marco
spellingShingle Mavian, Carla
Marini, Simone
Prosperi, Mattia
Salemi, Marco
A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
JMIR Public Health and Surveillance
author_facet Mavian, Carla
Marini, Simone
Prosperi, Mattia
Salemi, Marco
author_sort Mavian, Carla
title A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title_short A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title_full A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title_fullStr A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title_full_unstemmed A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title_sort snapshot of sars-cov-2 genome availability up to april 2020 and its implications: data analysis
publisher JMIR Publications
series JMIR Public Health and Surveillance
issn 2369-2960
publishDate 2020-06-01
description BackgroundThe severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. ObjectiveThe objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. MethodsWe used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. ResultsAlthough the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. ConclusionsAt the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.
url http://publichealth.jmir.org/2020/2/e19170/
work_keys_str_mv AT maviancarla asnapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT marinisimone asnapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT prosperimattia asnapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT salemimarco asnapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT maviancarla snapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT marinisimone snapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT prosperimattia snapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT salemimarco snapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
_version_ 1721488151145349120