Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence Data

As WGS is increasingly used by food industry to characterize pathogen isolates, users are challenged by the variety of analysis approaches available, ranging from methods that require extensive bioinformatics expertise to commercial software packages. This study aimed to assess the impact of analysi...

Full description

Bibliographic Details
Main Authors: Balamurugan Jagadeesan, Leen Baert, Martin Wiedmann, Renato H. Orsi
Format: Article
Language:English
Published: Frontiers Media S.A. 2019-05-01
Series:Frontiers in Microbiology
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fmicb.2019.00947/full
id doaj-0028de367112496a847e9e18693c6a10
record_format Article
spelling doaj-0028de367112496a847e9e18693c6a102020-11-25T00:30:02ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2019-05-011010.3389/fmicb.2019.00947446374Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence DataBalamurugan Jagadeesan0Leen Baert1Martin Wiedmann2Renato H. Orsi3Nestlé Institute of Food Safety and Analytical Sciences, Nestlé Research, Lausanne, SwitzerlandNestlé Institute of Food Safety and Analytical Sciences, Nestlé Research, Lausanne, SwitzerlandDepartment of Food Science, Cornell University, Ithaca, NY, United StatesDepartment of Food Science, Cornell University, Ithaca, NY, United StatesAs WGS is increasingly used by food industry to characterize pathogen isolates, users are challenged by the variety of analysis approaches available, ranging from methods that require extensive bioinformatics expertise to commercial software packages. This study aimed to assess the impact of analysis pipelines (i.e., different hqSNP pipelines, a cg/wgMLST pipeline) and the reference genome selection on analysis results (i.e., hqSNP and allelic differences as well as tree topologies) and conclusion drawn. For these comparisons, whole genome sequences were obtained for 40 Listeria monocytogenes isolates collected over 18 years from a cold-smoked salmon facility and 2 other isolates obtained from different facilities as part of academic research activities; WGS data were analyzed with three hqSNP pipelines and two MLST pipelines. After initial clustering using a k-mer based approach, hqSNP pipelines were run using two types of reference genomes: (i) closely related closed genomes (“closed references”) and (ii) high-quality de novo assemblies of the dataset isolates (“draft references”). All hqSNP pipelines identified similar hqSNP difference ranges among isolates in a given cluster; use of different reference genomes showed minimal impacts on hqSNP differences identified between isolate pairs. Allelic differences obtained by wgMLST showed similar ranges as hqSNP differences among isolates in a given cluster; cgMLST consistently showed fewer differences than wgMLST. However, phylogenetic trees and dendrograms, obtained based on hqSNP and cg/wgMLST data, did show some incongruences, typically linked to clades supported by low bootstrap values in the trees. When a hqSNP cutoff was used to classify isolates as “related” or “unrelated,” use of different pipelines yielded a considerable number of discordances; this finding supports that cut-off values are valuable to provide a starting point for an investigation, but supporting and epidemiological evidence should be used to interpret WGS data. Overall, our data suggest that cgMLST-based data analyses provide for appropriate subtype differentiation and can be used without the need for preliminary data analyses (e.g., k-mer based clustering) or external closed reference genomes, simplifying data analyses needs. hqSNP or wgMLST analyses can be performed on the isolate clusters identified by cgMLST to increase the precision on determining the genomic similarity between isolates.https://www.frontiersin.org/article/10.3389/fmicb.2019.00947/fullListeria monocytogenes (L. monocytogenes)whole genome sequence (WGS)high quality single nucleotide polymorphism (hqSNP)whole genome MLST (wgMLST)core genome MLST (cgMLST)CFSAN pipeline
collection DOAJ
language English
format Article
sources DOAJ
author Balamurugan Jagadeesan
Leen Baert
Martin Wiedmann
Renato H. Orsi
spellingShingle Balamurugan Jagadeesan
Leen Baert
Martin Wiedmann
Renato H. Orsi
Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence Data
Frontiers in Microbiology
Listeria monocytogenes (L. monocytogenes)
whole genome sequence (WGS)
high quality single nucleotide polymorphism (hqSNP)
whole genome MLST (wgMLST)
core genome MLST (cgMLST)
CFSAN pipeline
author_facet Balamurugan Jagadeesan
Leen Baert
Martin Wiedmann
Renato H. Orsi
author_sort Balamurugan Jagadeesan
title Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence Data
title_short Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence Data
title_full Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence Data
title_fullStr Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence Data
title_full_unstemmed Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence Data
title_sort comparative analysis of tools and approaches for source tracking listeria monocytogenes in a food facility using whole-genome sequence data
publisher Frontiers Media S.A.
series Frontiers in Microbiology
issn 1664-302X
publishDate 2019-05-01
description As WGS is increasingly used by food industry to characterize pathogen isolates, users are challenged by the variety of analysis approaches available, ranging from methods that require extensive bioinformatics expertise to commercial software packages. This study aimed to assess the impact of analysis pipelines (i.e., different hqSNP pipelines, a cg/wgMLST pipeline) and the reference genome selection on analysis results (i.e., hqSNP and allelic differences as well as tree topologies) and conclusion drawn. For these comparisons, whole genome sequences were obtained for 40 Listeria monocytogenes isolates collected over 18 years from a cold-smoked salmon facility and 2 other isolates obtained from different facilities as part of academic research activities; WGS data were analyzed with three hqSNP pipelines and two MLST pipelines. After initial clustering using a k-mer based approach, hqSNP pipelines were run using two types of reference genomes: (i) closely related closed genomes (“closed references”) and (ii) high-quality de novo assemblies of the dataset isolates (“draft references”). All hqSNP pipelines identified similar hqSNP difference ranges among isolates in a given cluster; use of different reference genomes showed minimal impacts on hqSNP differences identified between isolate pairs. Allelic differences obtained by wgMLST showed similar ranges as hqSNP differences among isolates in a given cluster; cgMLST consistently showed fewer differences than wgMLST. However, phylogenetic trees and dendrograms, obtained based on hqSNP and cg/wgMLST data, did show some incongruences, typically linked to clades supported by low bootstrap values in the trees. When a hqSNP cutoff was used to classify isolates as “related” or “unrelated,” use of different pipelines yielded a considerable number of discordances; this finding supports that cut-off values are valuable to provide a starting point for an investigation, but supporting and epidemiological evidence should be used to interpret WGS data. Overall, our data suggest that cgMLST-based data analyses provide for appropriate subtype differentiation and can be used without the need for preliminary data analyses (e.g., k-mer based clustering) or external closed reference genomes, simplifying data analyses needs. hqSNP or wgMLST analyses can be performed on the isolate clusters identified by cgMLST to increase the precision on determining the genomic similarity between isolates.
topic Listeria monocytogenes (L. monocytogenes)
whole genome sequence (WGS)
high quality single nucleotide polymorphism (hqSNP)
whole genome MLST (wgMLST)
core genome MLST (cgMLST)
CFSAN pipeline
url https://www.frontiersin.org/article/10.3389/fmicb.2019.00947/full
work_keys_str_mv AT balamuruganjagadeesan comparativeanalysisoftoolsandapproachesforsourcetrackinglisteriamonocytogenesinafoodfacilityusingwholegenomesequencedata
AT leenbaert comparativeanalysisoftoolsandapproachesforsourcetrackinglisteriamonocytogenesinafoodfacilityusingwholegenomesequencedata
AT martinwiedmann comparativeanalysisoftoolsandapproachesforsourcetrackinglisteriamonocytogenesinafoodfacilityusingwholegenomesequencedata
AT renatohorsi comparativeanalysisoftoolsandapproachesforsourcetrackinglisteriamonocytogenesinafoodfacilityusingwholegenomesequencedata
_version_ 1725328357751521280