Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.

A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to dat...

Full description

Bibliographic Details
Main Authors: David Burstein, Tal Zusman, Elena Degtyar, Ram Viner, Gil Segal, Tal Pupko
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-07-01
Series:PLoS Pathogens
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19593377/?tool=EBI
id doaj-1725796ca8db49a49120915e6d246dd4
record_format Article
spelling doaj-1725796ca8db49a49120915e6d246dd42021-04-21T17:22:59ZengPublic Library of Science (PLoS)PLoS Pathogens1553-73661553-73742009-07-0157e100050810.1371/journal.ppat.1000508Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.David BursteinTal ZusmanElena DegtyarRam VinerGil SegalTal PupkoA large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to date, approximately 100 effectors have been identified by various experimental and computational techniques. Effector identification is a critical first step towards the understanding of the pathogenesis system in L. pneumophila as well as in other bacterial pathogens. Here, we formulate the task of effector identification as a classification problem: each L. pneumophila open reading frame (ORF) was classified as either effector or not. We computationally defined a set of features that best distinguish effectors from non-effectors. These features cover a wide range of characteristics including taxonomical dispersion, regulatory data, genomic organization, similarity to eukaryotic proteomes and more. Machine learning algorithms utilizing these features were then applied to classify all the ORFs within the L. pneumophila genome. Using this approach we were able to predict and experimentally validate 40 new effectors, reaching a success rate of above 90%. Increasing the number of validated effectors to around 140, we were able to gain novel insights into their characteristics. Effectors were found to have low G+C content, supporting the hypothesis that a large number of effectors originate via horizontal gene transfer, probably from their protozoan host. In addition, effectors were found to cluster in specific genomic regions. Finally, we were able to provide a novel description of the C-terminal translocation signal required for effector translocation by the Icm/Dot secretion system. To conclude, we have discovered 40 novel L. pneumophila effectors, predicted over a hundred additional highly probable effectors, and shown the applicability of machine learning algorithms for the identification and characterization of bacterial pathogenesis determinants.https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19593377/?tool=EBI
collection DOAJ
language English
format Article
sources DOAJ
author David Burstein
Tal Zusman
Elena Degtyar
Ram Viner
Gil Segal
Tal Pupko
spellingShingle David Burstein
Tal Zusman
Elena Degtyar
Ram Viner
Gil Segal
Tal Pupko
Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.
PLoS Pathogens
author_facet David Burstein
Tal Zusman
Elena Degtyar
Ram Viner
Gil Segal
Tal Pupko
author_sort David Burstein
title Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.
title_short Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.
title_full Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.
title_fullStr Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.
title_full_unstemmed Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.
title_sort genome-scale identification of legionella pneumophila effectors using a machine learning approach.
publisher Public Library of Science (PLoS)
series PLoS Pathogens
issn 1553-7366
1553-7374
publishDate 2009-07-01
description A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to date, approximately 100 effectors have been identified by various experimental and computational techniques. Effector identification is a critical first step towards the understanding of the pathogenesis system in L. pneumophila as well as in other bacterial pathogens. Here, we formulate the task of effector identification as a classification problem: each L. pneumophila open reading frame (ORF) was classified as either effector or not. We computationally defined a set of features that best distinguish effectors from non-effectors. These features cover a wide range of characteristics including taxonomical dispersion, regulatory data, genomic organization, similarity to eukaryotic proteomes and more. Machine learning algorithms utilizing these features were then applied to classify all the ORFs within the L. pneumophila genome. Using this approach we were able to predict and experimentally validate 40 new effectors, reaching a success rate of above 90%. Increasing the number of validated effectors to around 140, we were able to gain novel insights into their characteristics. Effectors were found to have low G+C content, supporting the hypothesis that a large number of effectors originate via horizontal gene transfer, probably from their protozoan host. In addition, effectors were found to cluster in specific genomic regions. Finally, we were able to provide a novel description of the C-terminal translocation signal required for effector translocation by the Icm/Dot secretion system. To conclude, we have discovered 40 novel L. pneumophila effectors, predicted over a hundred additional highly probable effectors, and shown the applicability of machine learning algorithms for the identification and characterization of bacterial pathogenesis determinants.
url https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19593377/?tool=EBI
work_keys_str_mv AT davidburstein genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
AT talzusman genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
AT elenadegtyar genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
AT ramviner genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
AT gilsegal genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
AT talpupko genomescaleidentificationoflegionellapneumophilaeffectorsusingamachinelearningapproach
_version_ 1714666119239827456